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ABSTRACT 


This research investigates the problems inherent in Decision Support Systems 
(DSS) that depend on the quality and accuracy of legacy information as the basis for 
decision-making. A Spatial Decision Support System (SDSS) was developed at Naval 
Postgraduate School to analyze the comparative desirability of Army Reserve Unit 
locations. The Army Reserve Installation Evaluation System (ARIES) integrates a GIS 
mapping engine and a decision model solver in a flexible environment that leverages 


operational legacy database information for decision-making. 


Data quality problems from legacy sources motivated the development of a data 
migration plan to transform the source data into an architecture optimized for the ARIES 
SDSS application. This research developed a prototype Data Migration Tool (DMT) to 
extract the relevant source data into a centralized repository for the SDSS with an 
acceptable degree of data quality to support SDSS outcomes. Six data quality attributes 
were identified: accuracy, completeness, consistency, timeliness, uniqueness, and 
validity. The ARIES DMT focused on data validity and developed techniques for 
measuring and enforcing data validity. The DMT also specified individual 
responsibilities for data administration, development of data retrieval routines, and data 


quality assessment. 


Significant system performance enhancements resulted from implementation of 
the DMT by leveraging the spatial aspects of the underlying repository through 
geographic queries that efficiently localized subsets of the data files. Additional 
performance enhancements were obtained through the use of data warehousing 


techniques. 
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| INTRODUCTION 


A. GENERAL 


This research analyzes the problems inherent in Decision Support Systems (DSS) 
that rely upon legacy databases as the primary data source. The quality and accuracy of 
an outcome that any DSS returns cannot be better than the quality and accuracy of the 
underlying database information. We explore this premise in the context of a prototype 
Spatial Decision Support System (SDSS) developed for the United State Army Reserve 
Command (USARC) that allows analysis of the comparative desirability of Army 
Reserve Unit locations. Since many DSS’s are model-based, initial development often 
focuses on specification of the underlying model(s) and associated user interfaces. Issues 
concerning the data required to run the models are frequently left until the latter stages of 


development. 


Initial implementation of the USARC SDSS took exactly this approach and, as a 
result, encountered serious problems with the underlying data that compromised the 
quality of the decisions that the SDSS was able to render. Significant measures were 
required to resolve these problems; specifically an entire data administration module had 
to be developed to identify meta-data and regulate the extraction of DSS data from source 
data files. This module required the adoption of procedures to assess and monitor the 
quality of the data as it passed from the source legacy databases to the databases used as 
input to the SDSS. The spatial nature of much of the data put a special twist into this 
process since the SDSS application takes advantage of these spatial aspects to streamline 
system performance. This research explores the confluence of data quality, decision 
support, legacy data, and spatial data, and prescribes procedures for dealing with data 
quality in SDSS development. A major lesson learned was data quality must be 
addressed at the beginning of (S)DSS projects and not left until the end of the 


development cycle. 


B. BACKGROUND 


The Force Support Package (FSP) Readiness Office, a component of the U.S. 
Army Reserve Command (USARC), is tasked with assessing and improving the readiness 
of priority Troop Program Units (TPU). A TPU is the foundation of the Army Reserve 
force, ranging from 50 to 250, typically consisting of about 150 reservists. The TPUs 
that are in the FSP, which contain the units designated for rapid deployment, are of most 


concern to the Readiness Office. 


Readiness, in this context, refers primarily to personnel readiness, i.e., the ability 
to maintain troops that are properly trained and qualified individuals in a sufficient 
number. Many of the numerous factors that affect readiness are dependent on the 
location of the unit. Relocation of a unit to another facility can, at times, be the best 
solution when a unit is struggling to maintain personnel readiness. During today’s 
environment of force reductions and realignments, relocation may also be necessary to 


support force consolidation or restructuring efforts. 


Previously these decisions were based upon a combination of personal expertise 
and narrowly focused studies. This ad hoc process produced results that often proved 
difficult to communicate, defend, and build consensus around. The human decision- 
maker becomes overloaded quickly by the large number of factors involved in the TPU 
relocation decision without the aid of an automated decision tool. The inadequacies of 
the current approach to provide any detailed solutions to such a complicated problem 
inspired the search for a convenient and systematic automated tool that could be based 


upon a decision model. 


The use of computer based DSS’s to aid the decision-maker in making thorough 
and informed decisions will become more prevalent as the use of distributed working 
environments increase. Distributed environments allow access to more information that 
will increase the overall effectiveness of decision support tools. Experience with the 
USARC SDSS indicates that significant attention must be paid to the quality of data 


underlying decision systems in order to ensure the quality of the resulting decisions. 
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Cc. THESIS OBJECTIVES 


The primary objective of this thesis is to identify problems with developing an 
SDSS based upon legacy databases with a high variance in data quality. A secondary 
objective is to develop an application design process that, by incorporating data 
warehousing techniques, can counteract the effects of poor data quality on the resulting 
application. This involves analyzing the development process used for the current 
prototype, identifying the relevant data quality factors, reviewing data warehousing 
techniques, applying those techniques to address data quality problems in the prototype 
application, and examining lessons learned from the prototype development process. The 
research questions that will be addressed are: 

e What inherent problems are involved in the use of legacy database 

information in the development of a state of the art DSS? 
e What are the relevant data quality factors for site location decision problems? 
e What data warehousing techniques are relevant to the SDSS design process? 


e What steps should be taken during the design and development process to 
ensure that the data quality will support a level of confidence required by the 
user in the outcome decision? 


e Who should be responsible for the level of data quality involved in the 
development of an SDSS? 


e What are the unique problems that spatially enabled data present to the level 
of data quality? 


D. SCOPE 


This study will focus on the SDSS developed for USARC to support the unit 
location decision problem and unit readiness mission responsibilities. The automated 
decision tool supports the process of relocating units that are not meeting readiness goals 


to sites that afford them better opportunities to succeed. 


The USARC prototype, the Army Reserve Installation Evaluation System 
(ARIES), has a number of external restrictions imposed that limit the true effectiveness of 


the system. For example, only those facilities currently owned by the Army Reserve are 


considered as potential relocation sites (approximately 1,500 nationwide). The 
discussion of data quality will be in reference to the data that supports the decision factors 


of these facilities. For further details of the ARIES project refer to references 1 and 2. 


The original project requirements intended to avoid any extensions to existing 
data maintenance responsibilities. USARC also specified that all model inputs would be 
drawn from existing data sources. ARIES provides the decision-maker the ability to 
manually input data needed to support additional decision criteria for incorporation in the 
evaluation process. This off-line analysis would inject even more concern for data 
quality and the subsequent confidence level of subsequent decisions that can be reached 
using this tool. This research does not address data quality issues that arise from this 
source of “ad hoc” data; rather it focuses on the data quality of the “feeder” legacy 


databases and their extracted counterparts in the SDSS. 


The research sponsor did not define adequately the ownership associated with the 
data that was used to support the majority of the decision criteria. The lack of a 
responsible custodian left the interpretation of many of the data fields up to the designers 


and their ability to ascertain the meaning of the underlying database schema. 


E. ORGANIZATION OF THE STUDY 


The balance of this study is organized as described below. Chapter II discusses 
the design process and architecture used in the development of the ARIES prototype 
project. Chapter III discusses the basic characteristics and elements involved in data 
warehouses, data marts and issues with data quality. Chapter IV details how these data 
warehousing techniques were implemented in the ARIES SDSS application. Also 
discussed in that chapter are the problems with the availability and quality of the database 
information used in the decision process that surfaced throughout the production of the 
prototype user interface. Chapter V provides a number of post application design issues 
and recommendations for further study that could assist future SDSSs of this type. 


Chapter VI presents conclusions and the contributions of this study. 


i. SDSS ARCHITECTURE DEVELOPMENT 


A. THE ARIES PROJECT 


This chapter describes the design process and architecture of the ARIES SDSS 
project developed at the Naval Postgraduate School (NPS) for the Army Reserve 
Command. The ARIES Development Process is depicted below in Figure 1. At the heart 
of the ARIES architecture is a decision model that was developed in conjunction with a 
group of experts at the FSP office. The decision model produces a list of decision criteria 
that must then be mapped to operational source databases. This mapping process was 
done by identifying business rules for each criterion in the form of queries. These 


business rules became the basis of the data model used in the application. Once the 
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Figure 1. ARIES Architecture Development Process 
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decision model and data model were finalized, development of the system and user 


interface began. 


B. DECISION MODEL 


The core of any DSS is one or more decision models. The ARIES decision model 
is a multi-criteria model represented as a hierarchy of objectives or goals with associated 
measures or criteria involved in making a specific decision. An objective is referred to in 
most decision literature as a desired direction and a goal as the quantifiable progress in 
that direction. For the purposes of this discussion, we will adopt the terminology of goals 
and measures that is consistent with the decision software package used in the ARIES 


project. 


To begin the design of a decision model, a detailed elicitation process is required 
to capture the characteristics and aspects associated with the specific decision problem 
being modeled. This process identifies a top-level decision goal that is subsequently 
refined to layers of subordinate goals. The subordinate goals and their associated 
decision measures must be arranged in a hierarchy that allows the final analysis to arrive 


at an evaluation for the top-level goal. 


1. Decision Process Elicitation 


The first step in modeling the TPU location decision problem was to gather a 
group of knowledgeable experts in the area of Army Reserve manpower and identify the 
top-level, or overall goal. The use of experts rather than an extensive study was adopted 
in the interest of cost savings as well as the ability to develop a working prototype 
decision model in a Short period of time. The elicitation process was done by focusing on 
factors that were identified in prior research, Reference 1, of the TPU readiness issue as 


well as the process knowledge of the experts. 


The expert panel, consisting of USARC personnel, was able to identify an overall 


goal that related unit location to unit readiness. This was a challenging process because 
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of the difficulty of placing a measurement on readiness. Eventually the expert panel 
settled on an overall goal of site desirability. The panel decomposed site desirability into 
two subgoals, personnel readiness support (the ability to maintain the desired number of 
qualified reservists at the proposed site) and facility quality (a general assessment of the 


costs and benefits of a location that are only loosely related to readiness). 


This approach to determining decision goals was initially done without any 
concern for the availability of data that would subsequently support the model. The 
elicitation process also did not involve any formal review of the current process for 
making this decision. Rather, it was an effort to determine the ideal decision process for 
TPU readiness in the context of unit location. A review of the existing decision process 
would have identified information currently used to make an informed decision which 
could later become the foundation for building a data model. The lack of such a detailed 


data model proved to be an obstacle to the project. 


2. Decision Goals and Goal Hierarchy 


The overall decision goal of Site Desirability was broken down into two subgoals, 
Facility Quality and Personnel Readiness, which were in turn refined further into either 
additional subgoals or decision measures. Decision measures are the basic elements of 
the model to which a single objective value can be assigned. Each subgoal must be 
ultimately broken down into these basic elements to allow the multi-criteria decision 
making to occur. Table 1 shows the breakdown of the facility quality and personnel 
readiness decision goals into decision measures for the ideal decision model. Subsequent 
discussions revealed that that data did not exist for some of the measures so they were 
dropped from the initial model. Table 2 shows the decision measures could to be 


implemented with available data. 


The Facility Quality subgoal is used to describe specific attributes of a proposed 


facility (i.e., the building and the real estate). These values are primarily extracted from 


SITE DESIRABILITY 


I. Facility Quality 
% Administrative Space FT 
% Administrative Space PT 
% Motorpool Space 
Distance to Headquarters 
Facility Age 
Facility Maintenance Backlog 
Facility Condition 
Facility Operating Costs 
Facility Leased/Owned 
II. Personnel Readiness 
MOS Qualification 
Available Prior Service 
Available MOS from Closing Units 
Available MOS from the IRR 
Fill Level 
Market Supportability 
Market Quality 
Civilian Labor Market 
Closing Unit Transfers 
IRR Availability 
Recruit Market Size 
Area Units 
Area Drill Attendance 
Area Loss Rate 
Area Transfer Rate 
Average Area Manning 
Distance to Recruiter 
Reassignments 
Competition 
Training Support 
Equipment Readiness 
% Storage On-Site 
Distance to AMSA 
Distance to ECS 


Training Facility 
% Facility Usage 
Distance to Special Training 
Distance to WET Site 
Distance to Weapons Range 
Facility Weekend Use 
Table 1. Complete Hierarchy of Goals for Site Desirability 
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SITE DESIRABILITY 


[. Facility Quality 
Facility Age 
Facility Maintenance Backlog 
Facility Condition 
Facility Operating Costs 
Facility Leased/Owned 
II. Personnel Readiness 
MOS Qualification 
Available Prior Service 
Available MOS from Closing Units 
Available MOS from the IRR 
Fill Level 
Market Supportability 
Market Quality 
Closing Unit Transfers 
IRR Availability 
Recruit Market Size 
Area Units 
Area Drill Attendance 
Area Loss Rate 
Area Transfer Rate 
Average Area Manning 
Distance to Recruiter 
Reassignments 
Competition 
Training Support 
Equipment Readiness 
Distance to AMSA 
Distance to ECS 


Training Facility 
Facility Weekend Use 


Table 2. Goal Hierarchy (showing only those measures with automated inputs) 


databases maintained by the Army’s Corps of Engineers and describe the age, condition, 


capacity, and costs associated with the major structures of the site. 


The Personnel Readiness subgoal is used to determine the ability of the area to 
support personnel readiness. Personnel readiness was broken down into two subgoals, 
Fill Level and Military Occupational Specialties (MOS) Qualification Level. Fill Level 
indicates the ability of the area surrounding a site to support a sufficient number of 
reservists whereas MOS Qualification Level indicates the availability of the skill set 
required by the moving unit in the area of the proposed site. Each of these goals is 
further broken down generating a hierarchy of the goals and measures that make up the 


actual decision model. 


The resulting hierarchy of goals represents the location-related factors that were 
determined by consensus of the expert panel to be important in the TPU relocation 
decision. This goal hierarchy, shown in Tables 1 and 2, is used by the multi-criteria 


decision solver to obtain a final evaluation of the desirability of each site. 


3. Decision Measures 


A decision measure is the result decomposing of each goal in the hierarchy into 
objective inputs that can be qualified and assessed. These objective inputs can come 
from various sources such as databases, spreadsheets, data analysis, etc. The hierarchy 
developed by the expert panel allowed most of the inputs to come from existing database 


information, minimizing the involvement of the user. 


The decision analysis software integrates all the dissimilar dimensions of the 
measures by obtaining a common unit value for each decision measure. The common unit 
value is arrived at through the use of yield curves for the decision measures. A relative 
weight is also applied to each goal to denote the level of importance of that goal 
compared to other goals. As a result, certain nodes in the hierarchy can be calibrated to 
affect the outcome more strongly than others by assigning them higher weights. These 


values are then summed for each goal to determine the overall desirability. 
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Appendix A gives a detailed summary of each measure including the definition, 
source information used, resulting queries, and associated yield curve. These decision 


measures became the foundation that for the application’s data. 


C, DATA MODEL 


“Good decision support requires an integrated, stable, well-managed data 


resource.’ [Ref. 3:p. 267] 


A DSS requires data sources from which to draw information that will fully 
support the underlying decision model. For the ARIES decision hierarchy to provide 
acceptable confidence levels for the resulting decisions, it must be based upon objective 
historical data. .USARC stipulated that the ARIES application should minimize any 
associated data management. Specifically they required that ARIES give rise to no new 
data administration responsibilities. Further, they specified that all database information 
used must be available or easily transferable to the USARC Local Area Network (LAN), 
and the application should be able to retrieve information from those available data 
sources without regard for their location. These basic requirements formed the 


foundation from which the ARIES data model was developed. 


Using the goal hierarchy and the resulting decision measures, steps were taken to 
identify “business rules” for each decision measure that could be translated into objective 
equations. These business rules were also derived by consensus of the expert panel. The 
business rules allow source data elements to be identified that provide an objective 
assessment of a measure and therefore automate the site evaluation process. Once the 
required data elements are identified, the set of data files required to support the 


application are also identified. 


Je Developing the Business Rules 


Using the ideal decision hierarchy and resulting decision measures, the expert 


panel documented the factors or elements comprising each measure (i.e., Average Area 
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Manning = Number of Personnel Assigned / Number of Personnel Required). In 
addition to developing an objective rule for each measure, it was necessary for the expert 
panel to define each element that made up this business rule. These definitions are used 
to determine the actual data that are required to support each decision measure. A 
complete description of each measure and the associated business rule is contained in 


Appendix A. 


Di Mapping the Business Rules to Real Data 


Each equation definition identified required individual data elements. These data 
elements were then mapped to operational data elements available in database files on the 
USARC LAN. It was determined that ten of the decision measures in the ideal decision 
hierarchy did not have readily available data that could support the model. These 


measures were not automated in the final prototype application. 


Logic diagrams were drafted for each measure using the business rule and the 
identifying source data files. This information was gathered through discussions with 
individuals at USARC headquarters familiar with the requisite database information. 
Who were able to identify specific files that would contain the required data elements for 
each measure. Some information such as census information and facility information 


were determined to be available through other sources such as the Corps of Engineers. 


Because of the geographic nature of some of the decision measures (e.g., Area 
Loss Rate), it became apparent that a spatial dimension would be necessary in the final 
application. This requirement for area-specific data led to the integration of a Geographic 
Information System (GIS) to aid in the selection, querying, and visualization of this 


decision problem. 


3. Identify Source Data 


The source data identified include several types of database files, transactional 


data, spatial data, personnel data, historical data, and analysis data. This wide spectrum of 
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dissimilar database types posed a challenge in refining the logic developed for each 
measure. In most cases these source databases were being used and maintained by 
different entities within the USARC headquarters facility for their own use. The initial 
data sets that were collected for the prototype were extracts of these databases for the 
state of Pennsylvania. As development progressed the full national databases were 
collected and integrated into the project. A description of all the source databases is 
contained in Appendix B with the meta-data information available for each file. The 
initial development plan was to draw information directly from each source file during 
each individual site evaluation session. This process was found to be inefficient as 


discussed later in the chapter. 


D. SYSTEM ARCHITECTURE 


With the Decision Model complete and the Data Model specified, an automated 
decision application could then be developed to integrate the decision model’s 
information needs with the knowledge of available information in the data model. The 
Graphical User Interface (GUI) accepts the required inputs for the problem from the user 
and conducts the evaluation of the defined scenario. This basic architecture is depicted in 


Figure 2. 


It became clear early in the project that budget and time limitations would not 
allow the development of an application that would carry out all functions of this project 
independently. This led to the integration of several commercial of-the-shelf (COTS) 
products to conduct the decision analysis and assist in the GIS portion of the project. The 
ARIES application architecture consists of four components: an integrating shell, a 


mapping engine, a decision model solver, and a data preprocessor. 
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Figure 2. ARIES System Architecture 


Je Integrating Shell 


The application shell that integrates and operates the GUI is original code written 
in Visual Basic™. Visual Basic™ is an event-driven programming language that allowed 
the integrating shell to be developed to integrate into the infrastructure already in place at 
USARC. Visual Basic™ was chosen as the programming language because the USARC 
information system support personnel were maintaining other applications with it and 
already had a basic level of understanding. This would allow for future maintenance and 


improvements to the prototype to be completed in house by USARC personnel. 


Another USARC requirement for the prototype was that the final application 
should relieve the user of the burden of understanding the individual COTS applications 
and protocols involved in the transfer of information. Because of the predictable and 
structured nature of this decision process automation of most of the tasks was very 
effective. The only required inputs from the user are the moving unit identification code 
(UIC) and the facility identification code (FACID) for the proposed sites. Figure 3 shows 


the ARIES User Interface screen used to capture the input parameters. 
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Figure 3. ARIES User Interface Screen for Specifying Parameters 


The GUI will accept these inputs either as manual inputs or from the map display. 
The overarching shell uses a set of predefined tasks based'on the decision model to 
acquire the database information for each decision measure. Some of these tasks are 
carried out through an Objected Linking and Embedding (OLE) connection with the 


mapping engine and others are carried out using the database engine in Visual Basic™. 


Once the shell has obtained values for all the decision measures associated with 
each proposed site, this information matrix is passed to the decision solver. The decision 
solver carries out its evaluation and passes control back to the GUI where the user has the 


ability to print reports, conduct dynamic analysis, or consider another scenario. 
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ap Mapping Engine 


MapInfo™, already in use at USARC, was chosen as the mapping engine for 
several reasons. MapInfo™ satisfied all the known and anticipated functional 
requirements, 1t was already owned by USARC, had proven to be well supported and 


documented, and would minimize the need for additional training. 


MapInfo™ is a commercial mapping package that is used as a graphical input tool 
and provides for the spatial definition and processing of data. It converts positions to 
distances, makes proximity determinations, and classifies objects by geographical region. 
The integrating shell uses the OLE connection to pass data to and from MapInfo™ and 
launch a MapBasic™ program that executes the spatial queries. The ability of 
MapInfo™ to localize data from huge databases provided a significant performance gain 


when the spatial queries were implemented. 


3. Decision Model Solver 


A decision solver was required that would conduct multi-attribute utility analysis 
and allow for “what-if” dynamic analysis functionality. Logical Decision for Windows™ 
(LDW) is used as the decision solver in the ARIES application. LDW™ was chosen 
primarily for its superior implementation of the underlying decision framework, Multi- 
Attribute Utility Theory, and its ability to provide a flexible decision analysis 


environment. 


-LDW™ was determined to be superior than other similar products in terms of 
overall ease of use for the novice user. LDW™ supports for a wide range of techniques 
to obtain user preferences (e.g., ordinal criteria ranking, tradeoffs, direct graphical and 
tabular inputs). The application also allows the user to set the specific information about 
the yield curve that affects each decision measure to include: slope, continuous or 


discrete, minimum and maximum values, and shape. Another important feature that 
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LDW™, particularly for the ARIES prototype, is the ability to conduct dynamic 


Sensitivity analysis of an evaluation session. 


Given these fundamental strengths, LDW™ did have limitations in its ability to 
communicate with the other applications. The ARIES application must pass control to 
LDW™ when the decision analysis phase begins to allow LDW™ to work. The 16-bit 
architecture of LDW™ limited the available control methods allowing key-stroke 
passing as the only means to control the program externally. This limitation requires that 
the user be familiar with and be able to carry out some functions within LDW™ in order 
to take full advantage of the capabilities of the decision model solver. Through the use of 
these methods of passing control and information the basic evaluation of a single site 


location problem is fully automated to include report output. 


The ARIES shell passes the subjective values for each decision measure to 
LDW™ for evaluation against the stored default preference set of the goal hierarchy. 
This is done through a text file because of limitations in LDW™’s capacity to interact 
with other applications. LDW™ receives the matrix of values with the facility name and, 
using the stored yield curves and assigned weighting, evaluates the specific scenario. The 
user can either print the standard reports or carry out further analysis of that scenario 


using the LDW™ application. 


4. Data Preprocessor 


The final component of the system application, the data preprocessor, evolved 
from the need to have the operational data move smoothly into the ARIES evaluation 
process. The data preprocessor, like the shell, is written in Visual Basic™. Even if all 
source databases were consistent and accurate, their number and sizes present 
considerable performance challenges for a PC-based, front-end processor. Because of the 
size and the varying location of the data files involved in the ARIES data model an 
application that would provide an administrative function for the source information was 


necessary. 
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E. DATA PREPROCCESSOR: ARIES ADMINISTRATOR 


The data preprocessor, known as ARIES Administrator, is the transition element 
that moves the operational data from its source form to a centralized data resource that the 
ARIES application can access. USARC’s initial requirement to maintain the current 
location of each source data file was the primary reason for developing this component. 
As the prototype development progressed, it became clear that all the data elements had 
to be assembled in one central location to facilitate an acceptable performance level 
during problem evaluation. This additional function was taken on by the Administrator 
which evolved into an extracting agent. For the Administrator to conduct an extraction of 
the source data files, queries had to be generated and maintained in order for the process 
to be duplicated as the data files changed. The ARIES shell and the Administrator are 
separate applications that are only connected by the requirements of the ARIES data 


resource file structure. 


hs Maintaining File Locations 


In order for the Administrator to find a file for the extraction process the file 
name, path, and type must be maintained. This information is entered in the 
Administrator under the File Location tab by using the standard windows file location 
interface. Figure 4 shows the information maintained for each database. The 
Administrator also provides, for informational purposes, a list of fields, table names, and 
table indices that must be present to support the processing performed by the ARIES 
shell. In addition, the Administrator also maintains a file location of the COTS 
applications, MapInfo™ and LDW™, to allow flexibility in the installation of these 


supporting applications. 


2. Query Development Process 


Development of extract queries became necessary to obtain an acceptable level of 


performance for the ARIES application. The initial extract queries were designed to 
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Figure 4. ARIES Administrator File Location Screen 


retrieve only the required fields and records and place that information in the Microsoft 
Access™ format. This would allow the integrating shell to take advantage of the 
database engine associated with Visual Basic™. Further development showed the need 
to add conditional queries that would filter unwanted and obviously bad data. Additional 
aggregate queries were added to improve the performance and efficiency of the ARIES 


application queries conducted during runtime. 


Si Data Extraction Queries 


As each query was developed, it was first tested in a stand-alone mode and then 
implemented into the data extraction process. The Administrator Extract Queries Screen 
is shown in Figure 5. These queries are stored in an Access™ table, named 


Administrator, and identified by the table name it generates for the ARIES data resource 
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Figure 5. ARIES Administrator Extract Queries Screen 


file. These queries can be edited by the administrator to accommodate changes in a 
source data file or future changes in the application. The extract table structure for each 
query, as required by the ARIES application, is documented in the Administrator under 
the Extract Information area and can be reviewed by the administrator. This documents 
the structure of the table required by the ARIES application so that the administrator can 


adjust the queries of the extract without affecting the workings of the application. 


4. Data Cleaning, Standardizing, and Extracting 


The Administrator became a mechanism to transform the original data into a 
consistent data source for the ARIES application. For this reason, many of the queries 
that were developed retrieve only the fields and record data required by the business rules 


for each measure. It also became necessary to standardize the naming of fields that 
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referenced the same data element because different data files used different naming 
conventions (e.g., UIC, UIC1, CURR _UIC). This standardization allowed the 
application code to remain consistent without concern for the naming conventions used in 
the source data files and also supported the functionality desired of allowing source files 
to change without having to change the associated application code. One final task that 
the Administrator incorporated was a basic cleansing process. Certain values that were 
identified during initial attempts to query the data as being out of scope or null were 
removed during the extract. This was accomplished by applying additional criteria to the 


extract queries. 


F. CHAPTER SUMMARY 


The overall SDSS architecture of a decision model, a data model, an integrating 
application system, and a data preprocessor provides simplified access to a set of 
powerful tools for decision support. These four components generate a working 
prototype application that is able to complete data analysis in several minutes that would 
otherwise have taken several groups of individuals many weeks. The decision model is a 
mapping of the desired decision process into a hierarchy of goals and decision measures 
that will allow subjective inputs to be achieved for each measure. The data model is 
generated by developing business rules for each decision measure and identifying source 
data to answer each of those rules. The integrating application system was designed to 
bring together the decision model and data model to generate an analysis of a given 
scenario. The final component, the data preprocessor, 1s the transformation agent used to 


prepare and condition the source data for use by the application. 


Although it was not a focus of the original project, this system development 
process gave rise the need for a data warehouse component. As the separate components 
came together under the original architecture, the system’s ability to manipulate data 
became a limiting factor. The data preprocessor became the transformation agent that 
was able to remove the required data elements from the operational source files. This 


preprocessing organized the available information in a format that is to optimized to 


21 


support the decision process. By cleansing, aggregating, and extracting from the source 
data files the data preprocessor generated a specialized form of what is termed a data 


warehouse. Chapter III discusses terms and issues surrounding data warehouses. 
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Ul. DATA WAREHOUSING AND DATA QUALITY: TOPICAL 
DISCUSSION 


One of the major consequences of the ARIES project development was the 
realization that it was necessary to centralize the location of the source data files. This 
specific user requirement was not identified at the beginning of the project, but rather 
evolved during the development process as a need to improve system performance during 
an evaluation session. The process of structuring and creating this centralized data 
resource resembles some of the current database strategies being used by organizations to 
take advantage of enterprise wide database information. This chapter provides an 


introduction to data warehousing, data marts and issues involved in data quality. 


A. DATA WAREHOUSING 


The idea of gathering and integrating all the operational information of an 
organization in one place for the purpose of conducting analysis has been a goal of many 
information mangers. Not until 1990, however, when W.H. Inmon coined the term data 
warehousing was there a formalized architecture or thought process for developing this 
strategic management tool. Data warehousing, when used properly, will “provide the 
decision maker of an organization with the timely information necessary to effectively 
make critical business decisions.” [Ref. 4:p. 3] Since 1990 this concept has continued to 
flourish and grow to the point where today the data warehousing industry is estimated at 
$15 billion annually, and 95% of the Fortune 1000 companies have built, or are in the 


process of building, data warehouses. [Ref. 5:p. 1] 


1. Definition 


The term data warehouse is a “catch all” phrase that has taken on many different 
meanings. Michael Brackett defines a data warehouse as “a repository of consistent 


historical data that can be easily accessed and manipulated for decision support.” [Ref. 3: 


es 


p. 268-269] Marc Demarest defines a data warehouse as “a consolidation point for 
enterprise data from diverse production systems.”[Ref. 6:p. 1] W.H. Inmon who coined 
the term initially defines a data warehouse as “a subject-oriented, integrated, time- 
variant, and nonvolatile collection of data in support of management’s decision-making 
process.” [Ref. 7:p. 2] For the purposes of this discussion I will use Inmon’s definition 


and correlate the ARIES project data resource file with this definition. 


The concept of “subject-oriented” 1s based on the change from application- 
oriented data to decision-support data. Because decision making is the focus, data in a 
data warehouse will be aligned around the major subject areas of an organization 
whereas operational data will be oriented towards specific business processes it is 
supporting. Operational application-oriented data are detailed data centered on functional 
requirements while data for data warehouses will only include data for conducting 
decision analysis. [Ref. 7:p. 3-4] The ARIES data resource meets these criteria because it 


contains unit readiness subject data to be used in the decision analysis of site desirability. 


Integration as a critical aspect of the data warehouse 1s an important step that does 
not always receive appropriate attention. The main focus of the integration process in 
data warehousing is to obtain consistency throughout the varying legacy databases from 
which the data are extracted. Consistency can be achieved 1n many different ways such 
as standardized naming conventions, measurement of values, encoding structures, and 
physical characteristics of data. Integration assures that data are stored in a single, 
globally acceptable manner even if the underlying legacy systems do not do so. [Ref. 7:p. 
5-7] Data warehouse integration was a critical objective in creating the ARIES data 
resource. Naming conventions were standardized (e.g., ZIPCODE, ZIPC, etc. were 
changed to ZIP), and some data items were manipulated to ensure the consistency of the 


encoding of the data item (e.g., Nine digit zip codes changed to five digits). 


One of the goals of decision analysis is to look at historical data in order to say 
something about the future. This leads to the need for a time element in the data 


warehouse to make it an effective tool for decision support. Time variance in a data 
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warehouse shows up in several ways. First, data warehouses represent data over many 
different periods of time, encompassing years, year-to-day, months, month-to-day, weeks 
and days. Second, the index key structure of the data warehouse in all cases maintains an 
explicit time dimension whereas operational databases are more likely to maintain the 
time element on an implicit basis. The difference is that the data warehouse will maintain 
a specific time element as a part of the index key. This is not the case in most operational 
data files where dates may be associated with the file themselves and not with each data 
element. Third, the data in a data warehouse are a series of snapshots from the 
operational database that cannot be updated. [Ref. 7:p. 8-9] In the ARIES context, time is 
a less important factor than in many data warehouses; specifically, time is visible with 
respect to the date of the extraction, and therefore the user is aware that the data are 


assumed to be accurate as of a specific date. 


The final defining characteristic of a data warehouse is nonvolatility. This 
concept rises from the idea that a data warehouse contains a snapshot of the operational 
data and will not be updated in a traditional sense. The only real functions that happen in 
a data warehouse are the action of loading the data into the warehouse and any actions 
accessing that data for the purpose of analysis. This concept provides a stable platform 
upon which the decision-maker can base decisions. The use of a separate data picture 
relieves strain on the operational databases from what would otherwise be exhausting 
analytical queries. [Ref. 7:p. 10-12] In the ARIES project, the Administrator is the agent 
that allows the data resource for the ARIES project to meet this criterion of a data 
warehouse. Each extraction of the source files is a snapshot of the source data at that 
time. One difference between the ARIES Administrator and “standard” data warehouses 
is that the ARIES data resource file is replaced in whole rather than created as an addition 
to previous extractions whereas a true data warehouse would build on this historical 
dataset while extracting and loading new data. The ARIES data resource differs from 
traditional data warehouses because it is designed specifically to take advantage of the 


spatial aspects of the underlying data sets. The use of a multi-criteria decision model to 
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determine the site desirability of an area drove the need to orient data based on its 


geographic content. 


“By any definition, however, a comprehensive data warehouse is much more than 
archived events equipped with a general purpose front-end query tool.” [Ref. 8:p. 1] As 
the definition of data warehousing continues to develop, applications of and uses for data 


warehousing will continue to expand and become more prevalent. 


2. Applications 


Traditional data warehouses fall into two categories, either Relational Database 
Management Systems (RDBMS) or Multi-Dimensional Databases (MDDB). Only in the 
past few years has data warehousing been viewed as a way for organizations to gain 
insight about the information embedded in their operational data sets. The necessity for 
Operational data to be reorganized and structured in a data warehouse architecture is 
driven by the need to maintain acceptable performance and integrity levels in both the 


operational and evaluational data sets. 


An RDBMS is a database system that organizes and accesses data as two- 
dimensional rows and columns. Data are organized so that related information can be 
accessed using Structured Query Language (SQL). Data that are linked together with 
common key values will support a certain level of data integrity, but may create a large 
amount of overhead at query time depending upon the complexity of the queries required 
to correlate data elements. Using RDBMSs to support complex analytical processing and 
decision support has been difficult. The performance of a RDBMS is hindered when it is 
forced to handle the complex aggregation type queries expected in a data warehouse 
environment. Each time a query is executed it must aggregate the data that the query is 
seeking. This sometimes involves millions of records. Until new technologies are 
developed and tested, RDBMS alone would not be the best choice for a data warehousing 


project that involves numerous complex queries. 
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An MDDB is a data base technology that represents multi-dimensional data as 
aggregations of data in cells that are the intersection of multiple dimensions. <A 
dimension is a table with a single-part key that relates directly to a fact table that in turn 
relates all the dimensions in a star-like structure using a multi-part key. Figure 6 shows 


an example of a fact table. 


Time Dimension 


Time_key . 
| DayOfWeek 


FiscalPeriod 














etc. ; 
Store Dimension ' mes 
| Store_key Time_key 
StoreName . Product_key 
PNolol $3) “| Store_key 
FloorType | Customer_key , 
Sie. a Clerk_key 
—ar-* Register_key = 
Cierk Dimension Promo_key ‘ | Customer_key 
Clerk_key =| 'y z CustomerName 
| s Dollars Sold 
ClerkName Units Sold . PurchaseProfile 
JonGrade . Dollars Cost etc. 
eter C4) 


Promo Dimension 


eS aici 
* "q 


| Register_key 
& Location 
Type 
etc. 





Figure 6. Example Fact Table. [Ref. 9:Figure 2] 

The fact table in a MDDB 1s used to traverse the data across multiple attributes 
quickly whereas dimension tables contain the actual descriptive data. [Ref. 9:p 2-3] Data 
in the MDDB will be stored in forms that facilitate the common usage patterns of users. 
Summarized data that are accessed frequently are preprocessed and made available for the 
user to query upon demand, unlike the RDBMS that would have to process the query 
dynamically each time there is a request for that data. This allows for quick retrieval of 


predefined calculations and efficient results. [Ref. 4:p. 6] 


The ARIES data resource file that was generated to support the decision goal of 
relating TPU readiness to site desirability does not fit either of the two traditional data 
warehouse types described above. Because the complex queries involved in the ARIES 


project required aggregating data elements across many data files, the use of a relational 


on 


model would have hindered the performance of the evaluation session. The queries for 
each decision measure, listed in Appendix A, did not require a multi-dimensional analysis 
and therefore there was no need for an MDDB model. The fact that the 17 different 
databases represent data from different areas did not allow for the separation of the data 
into formal dimensions. The geographical nature of the decision goal and all its data files 
having a spatial aspect qualifies ARIES as a special kind of data warehouse called a 
Spatial Data Warehouse. The term “spatially enabled” is used to describe data that have 
this spatial or geographical component. Spatial enabling allows data to be related across 
locations, boundaries and other defined lines that cannot be done easily in the traditional 


forms of data warehousing. 


3 Design Concerns 


Marc Demarest defines four fundamental goals of a data warehouse that serve as 


the basis for complex, forward looking business modeling: 


1. To protect production systems from query drain by moving query 
processing onto a separate system dedicated to that task, and extracting 
all the relevant information from each production data source at 
predictable times when off-peak usage patterns prevail; 


2. To provide a traditional, highly manageable data center environment 
for DSS using tools and practices comparable to those used in data 
center On-Line Transactional Processing (OLTP); 


3. To build a Unified Data Architecture (UDA) or Enterprise Data Model 
(EDM) in the warehouse, so that data from disparate production 
systems can be related to other data from different production systems 
in a logical, unified fashion. 


4. To separate data management and query processing issues from end- 
user access issues so that they can be treated as distinct problems. [Ref. 
6:p. 4] 
These goals provided the foundation for organizations to begin leveraging huge amounts 


of data they have maintained for years to gain a competitive advantage. They also 


provide a sound basis to begin the development of a data warehouse project but these 
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goals fall short in anticipating the dynamic nature of contemporary organizational 


computing. The last two goals prove to be the biggest drawbacks in this respect. 


Developing an in-depth UDA or EDM requires extensive resources and time to 
complete. Many businesses do not have sufficient physical or financial assets to devote 
to a project that may not deliver results for a relatively long period of time. Meanwhile, 
the rapidity of changes in business requirements will inevitably cause the EDM to 
undergo continuous renovation. As a result, these renovations may very likely be costly 
without providing timely responses to changes in the analysis needs of the user 


communities. 


The fourth goal of a data warehouse focuses on the data management and 
querying process, and maintaining these functions separate from the access available to 
the end-user. This goal is based on the need to perform these large-scale functions in a 
mainframe based application environment. The performance of current client/server 
desktop systems has put computing power more directly in the hands of the end-users 
who can handle portions of these tasks. The ability of the user to manipulate and analyze 


data directly is required in today’s dynamic business environment. [Ref. 6:p. 4] 


These shortcomings in data warehouse design architecture gave rise to a more 
flexible and less expensive solution to organizations’ data analysis needs. In 1991 the 
Forrester Research firm declared data warehousing dead and replaced it with a term they 
called data marting. [Ref. 6:p. 5] The next section will discuss the differences between 


data marting and data warehousing. 


B. DATA MARTS 


The terms themselves suggest that the difference between a data warehouse and a 
data mart would be in the size of data maintained. The difference in size may be true in 
most cases but more significant differences lie in the application and implementation of 
the project. Table 3 highlights some of the major differences between a data warehouse 


and a data mart. 
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Data Warehouse and Data Marts: What’s the difference? 
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Table 3. Data Warehouse, Data Mart differences. [Ref. 10:p. 9] 















Le Definition 


A Data Mart is a decision support database application that provides decision- 
making solutions for a narrowly specified group of knowledge workers. The data mart 
focuses on the needs of the knowledge worker and discounts the underlying production 
systems in an effort to provide a DSS solution for the workers. This focused approach is 
achieved by keeping the data mart oriented to one subject area versus the multiple 


organization wide approach of a data warehouse. 


Data marts are more appealing to the business community today because of the 
reasons mentioned in Table 3 (e.g., size, implementation time, and cost). The smaller 
size of the data mart compared to the data warehouse allows the information to be 
available to more users in the distributed desktop environment that characterizes today’s 
business world. [Ref. 11:p. 1-2] This philosophy allows the use of systems that are 
already in place on decision-makers’ desks to conduct detailed decision analysis without 


the requirement of investing in large amounts of hardware. 


The lag time between implementation of a project and output of some useful 
results is an important issue in the overall success of a project. Josh Bersin, Group 


Director of Data Warehouse Solutions at Sybase, Inc. indicates that a data mart must 
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deliver results in the first 90 days. [Ref. 11:p. 2] Because a data mart solution can be 
designed and implemented in a fraction of the amount of time, i.e., months versus years, 
it is able to adjust more rapidly to the changing business environment. Data marts should 
be designed with the concept of expandability in mind because, as users explore the 
information available, they will want to look at the data in ways for which it was not 
originally intended. The capacity of the data mart to be flexible and adjust to the user 
provides the additional feature of scalability. 


The ability of an organization to implement a data mart quickly using existing 
hardware infrastructure provides an immediate cost benefit. In today’s business world 
where every dollar expended is scrutinized closely, it is important to provide business 
solutions that offer a competitive advantage at a minimum cost. Data marts provide this 
advantage in their specific subject area. It is important for the organization to ensure that 
data marts are not built in a vacuum and that each data mart is designed with the 
enterprise wide data model in mind. This will prevent the proliferation of stovepipe 


systems. [Ref. 11:p. 1-2] 


Marc Demarest first discussed the concept of integration of data marts across the 
organization in 1993 when he recommended his solution to the enterprise-wide decision 
Support problem. Instead of using only a data warehouse or data mart he recommended 
the use of a hybrid data architecture. Demarest’s thinking was ahead of its time and he 
was the target of some scathing criticism for suggesting the combination of the two 
philosophies. In his article “Building The Data Mart”, he laid down an architectural 
model for combining a single warehouse and multiple data marts into one integrated 
enterprise decision tool that has become one of the most popular designs of enterprise- 


wide decision support. [Ref. 6:p. 1] 


The ARIES project data resource 1n many ways fits the definition of a data mart. 
It was implemented in a matter of months, developed on a limited budget, and was 
designed to support a specific decision process for front-line users. However, in other 


ways it does not fit the traditional form of a data mart. The ARIES data resource file is a 
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collection of resources that has been aggregated and manipulated to take advantage of the 
spatial aspects of each data set. The data files are not maintained in one of the traditional 
formats of a data mart (1.e., relational or multi-dimensional), but rather are maintained as 
separate tables that will provide data for each of the twenty decision factors in the most 
expeditious way. The need to gain performance speed during the querying process forced 
the use of many geographic queries that quickly localize data by the spatial elements that 


are already present in the data sets. 


jae Applications 


Data marts fall into similar categories as the data warehouse that are based on the 
intended use and types of data that are to be manipulated. The two categories are based 
on the same design principals as discussed earlier in this chapter for data warehouses, 
multi-dimensional and relational. A choice between these two design architectures is 


based on the type of analysis to be done as well as the type of data to be analyzed. 


MDDM data marts are used to look analytically at the same data in different 
ways. They maintain large amounts of numeric data such as sales data. Once the data are 
loaded, either from the data warehouse or from external sources, it is maintained in a very 
structured framework. MDDM data marts are most effective for analyzing numeric data 
in an ad hoc manner. This approach to analytical processing for decision support is called 


on-line analytical processing (OLAP). [Ref. 12:p. 4] 


The relational data mart uses a form of analysis processing known as Relational 
On-line Analytical Processing (ROLAP). ROLAP data marts support a much wider 
range of purposes for numeric and textual data and therefore allow for the use of a more 
general purpose decision tool than the MDDM counterparts. They provide, through the 
use of relational technology, the ability to conduct both disciplined repetitive queries and 
ad hoc usage. The data mart concept has its foundation 1n providing the knowledgeable 
user with the decision support tool that fits their needs and provides access to just the data 


that the user needs to see. 
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3. Design Concerns 


In designing a data mart the philosophies are founded on the idea of an application 
being user-oriented in nature. The system is designed to provide the smaller set of users 
with the exact data set they are going to be using versus an enterprise wide set of data for 
possible decision concerns. This concept provides for a somewhat different set of 
processes to be conducted during the design phase. Marc Demarest identified four 
distinct processes: [Ref. 6:p. 8] 

e Extract all data relevant to the business decision-making of the groups of 

knowledge workers 

e Store the resulting data sets in one location: the data warehouse. 


e Create a unique cut or series of cuts of the data warehouse for each knowledge 
worker community. These are the “data marts”. 


e Supply the decision-support tools appropriate to the knowledge workers’ style 
of computing. 


The extraction process involves the translation of the data to standard formats, 
scrubbing the data for anomalies, and copying only the data elements required for 
decision-making. This extraction process creates a subset of the operational data set 
known as a “cut.” Storing the data in one location provides a big picture of the business 
for the major processes in the organization. Because different users throughout an 
organization apply different aspects of the data it is important to provide each group of 
users with a specific cut of the data warehouse for their use. This is done with use of data 
marts. Finally and most importantly, decision-support tools must be provided that match 
the skill sets of each group of users. If the user is unable to access the resource, it is not 


an asset but a liability. 


The ARIES project conforms to these four distinct processes. All pertinent data 
are extracted to a single location. The extracted data are intended for use by a specific 
group of users and in the format that USARC designed. The data are provided through a 
powerful decision-support tool that provides extensive functionality for the common user 


as well as detailed analysis capabilities for the knowledgeable user. 
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Data warehouses and data marts have the same foundation. The data they 
represent is the detailed data maintained in operational transactional data files. Therefore, 
the quality of the data retrieved from the data warehouse or data mart is directly related to 
the quality of the data in the underlying operational data sets. The next section will 


discuss the issue of data quality as it relates to the data warehouse environment. 


Cc; DATA QUALITY 


The success of a data warehouse or data mart project can not be insured by the 
best user interface or the newest database technologies if the underlying data is incorrect. 
The quality of the data obtained for use in any decision support tool becomes a hazard 
that the users must recognize. It must be managed during the design, development, and 
implementation phase of any project. Consider the following examples: 

e Inaccurate data related to categorization of bank customers resulted in 

erroneous risk exposure estimates, leading the bank to believe it was 
more diversified than it was. When the oil market softened in Texas, 


banks having a large number of Texan accounts suffered a major loss 
because of the inaccurate representation of risk. [Ref. 13:p. 1] 


e A senior level military officer was defending the defense budget before 
the U.S. Senate. When questioned about a discrepancy in the number 
of authorized officers shown in the proposed budget versus the | 
congressional numbers, no one could explain the discrepancy. That day 
the military lost 2,500 authorized officers it needed because Congress 
liked the lower number. It turns out that a data timeliness problem 
within one of the data warehouse source systems were the reason for 
the discrepancy. [Ref. 13:p. 2] 


These examples show the necessity of evaluating the value of the data that a decision- 
maker is using to make important decisions. This section will define data quality, 
identify the attributes that make up the quality of data, and discuss the elements that have 


the greatest effect on data integrity. 
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1. Definition 


With the operative word in data warehousing and data marting being “data,” the 
adequacy of the underlying data used to build the data warehouse must be determined. 
“Data Quality” is the term used to identify and manage the effects of inadequacies of the 
data in a decision-support environment. Defining data quality and maintaining a level of 
data quality is a difficult task. It is a common theme throughout the literature that 
determining the quality level of data used for decision-making is important to the 


eventual success of any data warehouse project. 


Data Quality is defined by Ken Orr, of the Ken Orr Institute, “as the measure of 
the agreement between the data views presented by an information system and that same 
data in the real-world.” [Ref. 14:p. 2] Richard Wang and Yair Wand define data quality 
as “a multi-dimensional concept made up of dimensions like accuracy, completeness, 
consistency, and timeliness.” [Ref. 15:p. 87] Michael Brackett states that data quality is 
an indication of how well data in the data warehouse meet with the business information 
demand and includes data integrity, data accuracy, and data completeness. [Ref. 3:p. 144] 
Duane Hufford says “data quality is the state of completeness, validity, consistency, 


timeliness and accuracy that makes data appropriate for a specific use.” [Ref. 13:p. 1] 


It is easy to identify several common elements in the attributes that all of these 
individuals believe make up data quality. For the purposes of this discussion I will use 
the definition that is published by the Defense Information Systems Agency (DISA) in 
the DoD Guidelines on Data Quality Management that identify six characteristics of data 
quality: accuracy, completeness, consistency, timeliness, uniqueness, and validity. [Ref. 
16:p. 2] Table 4 gives a description and an example of each of these six characteristics. 
The application of the use of these six characteristics provide a sound foundation for any 
organization to begin the process of identifying a confidence level for the quality of data 
that are being used for decision-making. These characteristics are present in most legacy 
databases in some form, which makes any data warehouse to which this information is 


migrated susceptible to the same types of errors present in the legacy system. “One data 
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Consistency A measure of the degree to Percent of matching values across 
which a set of data satisfies a set | tables/files/records. 
of constraints. 

Timeliness It represents the degree to which | Percent of data available within a 
specified data values are up to specified threshold time frame(e.g., 
date with the real-world. days, hours, minutes) 

Uniqueness The state of being the only one Percent of records having a unique 


of its kind. Being without an 
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Validity The quality of data that is | Percent of data having values that 
| founded on an adequate system | fall within their respective domain 

of classification and is rigorous | of allowable values. 
enough to compel acceptance. 


primary key. 





Table 4. Data Quality Characteristics. [Ref. 16:p. 2] 


manager for a large company reported that fully 60% of the data that was transferred to 
their data warehouse failed to pass the business rules that the systems operators said were 


in force.” [Ref. 14:p. 6] 


Because the recipients of this “dirty data” are the resources that organizations are 
using to make “fact based” decisions from, it is important to understand data quality as it 
relates to the project. Organizations create data warehouses and DSSs to avoid making 
inaccurate assumptions about their business. They should then not make assumptions 
about what makes up the data set used to load the data warehouse. [Ref. 17:p. 1] Creating 
a plan to improve data quality is part of understanding that legacy data itself will not meet 


the quality standards required to make decisions. 
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om Improving Data Quality 


The process of improving data quality is an incremental process that can be 
conducted in all phases of a project from birth to implementation and beyond. 

Although database consultant Ken Orr has called data warehouses the 

sewage treatment plant of enterprise data, this is not the objective of data 

warehousing. It is, unfortunately, the unintended result of loading legacy 

data that has not been subjected to data-quality improvement. [Ref. 18:p. 

1} 
To improve the quality of data in any system you must first determine the baseline quality 
of the current data set must first be determined. This is done by comparing actual data 
instances against the established rule sets that have been established. The rule sets will 
become the metrics like those shown in Table 4 and will be documented in the meta-data 
of the data files for future reference. If these rules are not documented, the first step in 
improving data quality is to document the business rules that the current data represent. 
An important part of this initial assessment is to identify the responsible stakeholder or 
stakeholders and to get them involved in the improvement process. Once the data quality 
baseline assessment is complete the next step is to determine and document the level of 


quality required by the intended business use. 


There will be different levels of quality required for data depending on the 
intended use. Ken Orr in his discussion of Data Quality and Systems Theory states that: 


No serious information system has a data quality of 100%. The real 
concern with data quality 1s to insure not that the data quality is perfect, 
but that the quality of the data in our information system is accurate 
enough, timely enough, and consistent enough for the organization to 
survive and make reasonable decisions. [Ref. 14:p. 3] 


A system that is used to make life threatening decisions will have need for higher quality 
data than system used to identify the placement of a new facility. This required quality 
level or the “enough” that Orr speaks of can only be determined by the users of the 
system. Once the user determines the level of quality required, it becomes the goal of the 
project to attain that level. Part of attaining that level is to have in place the mechanisms 


to measure and maintain that quality over the life of the project. Where possible this 
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should be done at the origination point of the data instance. Larry English states that “‘to 
improve warehouse data quality you must improve the business processes that produce 
the data.” [Ref. 16:p. 4] If this is not possible, quality controls must be put in place in the 
migration path of the data from the operational source to the data warehouse or data mart. 
This process is known as “data cleansing.” Because the source files that were used in the 
ARIES project are not in direct control of the customer, the process of data cleansing was 


used extensively in the development of the ARIES data resource file. 


3. Data Migration 


The operation of moving data or loading it into the data warehouse is not a simple 
task and requires substantial planning. As pointed out previously in this chapter, if you 
simply move the data from the operational system into the warehouse you are moving the 
data problems as well. The process of migrating the data therefore becomes a point at 
which problems with the legacy data can be identified and possible solutions can be 


developed. 


The migration process involves a method to transfer the data to the target data 
warehouse, transformation of the source data into the data warehouse architecture, and a 
method to clean or scrub data problems. Figure 7 is a diagram of the migration process 


that shows the flow of legacy data into the target data warehouse. 
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Figure 7. Data Migration Process 
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The perfect migration process would involve continuous measurement of the data 
being transferred against the business rule sets that is in place and has the ability to 
provide feedback in the form of a control system. The concept is discussed both by Ken 
Orr in his paper on Data Quality and Systems Theory [Ref. 14] and in the DoD 
Guidelines for Data Quality that is published by DISA [Ref. 16]. 


The function of this migration process is handled by the ARIES Administrator in 
the ARIES project. The Administrator maintains the business rules that USARC 
developed and conducts the transfer of data from the source files to the target data 
resource file. It does not contain the ability to ascertain the quality of the data being 
transferred or measure that quality against any form of metrics. This process must be 
conducted outside of the automated migration process provided in the ARIES SDSS 


project. 


D. CHAPTER SUMMARY 


The philosophies and technology involved in the data warehousing process are 
currently in their infancy. The concept is growing rapidly as is shown by its wide 
acceptance in the leading business communities. As long as the concept of subject- 
oriented, integrated, time-variant, and nonvolatile data collections continues to provide 
businesses with a competitive advantage, the data warehouse will be at the center of the 
enterprise decision-making tool set. Data marts have already proven to be an acceptable 
way to provide a specific cut of data to a particular group of users with the an intent of 
providing greater accessibility. The use of data warehouses and data marts in a common 
architecture will provide even greater access to the enterprise data. Greater access will 
give rise to new uses for that data and allow the users to maximize their use of the 


available data. 


The ability of an organization to take advantage of “spatially enabled” data is a 
new concept that requires much more research and development. Data that are spatially 


enabled will begin to link data that otherwise had no common link and will allow the 
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organization to fine tune their decision-making on a new level. The ARIES project is a 
working prototype of an application that has made every attempt to maximize the spatial 
elements of the source data. This has allowed the automation of a decision process 


previously thought to be too complex to be handled in a client/server environment. 


One aspect of the data warehousing and data marting concept that is consistent, as 
long as the data being warehoused is legacy data, is the need to determine the quality 
level of the source data sets. “Data quality” is defined by six characteristics of data: 
accuracy, completeness, consistency, timeliness, uniqueness, and validity. The quality of 
the data that are being used can be improved during all phases of a project. The 
improvement process involves identifying ownership of the source data, conducting a 
data quality baseline assessment, and determining the required level of quality for the 
system. For the quality improvement plan to be effective it must provide a mechanism of 
providing feedback to the data systems for the life cycle of the system. This continued 
quality assessment commonly takes place during the migration process. This process 
allows the data to be transferred, transformed, and scrubbed to meet the requirements of 


the data warehouse architecture. 


Chapter IV will discuss the how the quality level of the source data files for the 


ARIES SDSS project were determined and identify the anomalies that were identified. 
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IV. ARIES SDSS APPLICATION: PROBLEM EVALUTION 


The ARIES SDSS prototype was developed in direct response to the requirements 
set forth by the intended users at USARC. The final prototype is a result of continuous 
adaptations to changing user requirements, usability improvements, and solutions to 
system design and implementation problems. Problems were encountered in several 
areas during the development process; business rule development, query performance, 
and data anomalies. This chapter discusses specific examples of problems from each of 


these areas as well as the methods used to overcome them. 


A. BUSINESS RULE DEVELOPMENT PROBLEMS 


The business rules of the SDSS are the links that connect the hierarchical decision 
goals with actual data instances in source data files. The business rules of the ARIES 
project decision measures are described in detail and listed in Appendix A. An example 


of an ARIES SDSS business rule is the following: 


Area Loss Rate Is equal to the number of losses to units in the area 
during the previous fiscal year divided by the total 
number of reservists currently assigned to those units. 
Losses are determined by counting the entries in the 
FYxxLOSS file where the data element “TRMN” equals 
“LOSS” that are associated with each VIC in the area. 
The total number of assigned reservists is determined by 
counting all of the personnel records in the GI8CWE 
file associated with each UIC in the area. “In the area” 
is defined as within a 50 mile radius of the proposed site. 


Some of business rules documented during the project development phase were 
flawed in their initial assumptions. The problems encountered fall into two main 
categories: errors in logic and rules not support by data. The next sections will discuss 


examples of these problems. 
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1. Errors in Logic 


An example of a logic problem occurred with the requirement by seven of the 20 
measures to identify all units “in the area” of a proposed facility. The business rule 
developed by the expert panel at USARC identified a file called COMMAND PLAN as 
the best source from which to obtain this information. This was based on the assumption 
that COMMAND PLAN contained entries for every unit and facility in the Army 
Reserve. After reviewing the actual data entries in the file, however, it was found that the 
file actually contained multiple entries for each site, including information on closed and 
proposed sites as well as active sites. It was therefore impossible to identify a list of valid 
units or facilities solely from this data source. When the system was run under this 
assumption the program failed because of multiple entries with same values in a key 


index field. 


It was then determined that this information must be derived from a complex 
query that matched the entries in COMMAND PLAN against the G17 source file that 
lists basic facility information (Table 5). This query, referred to as VALID_UIC, creates a 
table of UICs that is created and only stored in the ARIES SDSS data resource file. 


Because the COMMAND PLAN source file contains historic, present, and future entries 


VALID _ UIC 

SELECT UIC, FAC_ID, UnitName, City, State, Zip 

FROM G17Natl 

WHERE GI17Natl.UIC = ANY (SELECT CMDPLAN.UIC 
FROM CMDPLAN) 


Table 5. VALID UIC Query 


for each site, a filter query was added to the extraction process in the Administrator to 
load only data instances from COMMAND PLAN that are valid over the next 13 months, 
as shown in Table 6. The combination of these two queries solved the problem of 


identifying valid Army Reserve units. 
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CMDPLAN 

SELECT DISTINCT UIC, FACID AS FAC _ID, EDATE 

FROM COMMANDPLAN 

WHERE (FACID < “N/A”) AND (FACID < “TBD”) AND 
(LEN(FACID) > 2) AND ((LEFT(EDATE,4) = ‘CCYY’ 
AND MID(EDATE,5,2) <= “MM”) OR (LEFT(EDATE, 4) 


<= ‘CCYY’ 
ORDER BY UIC, EDATE DESC 
INTO CMDPLAN 
INDEX On UIC As VIC 
Note: Application automatically adjusts the dates to obtain a 13-month 
window. 


Table 6. COMMAND PLAN Filter Query 
ip Rule not Supported by Data 


An example where underlying data did not support the business rule was 
uncovered while determining the value for backlogged maintenance actions of a facility, 
Measure #1 in Appendix A. The original business rule applied a criterion of totaling only 
the “K-account” unfunded requests, identified by the fund code of “BMAR”. After 
reviewing the data file and attempting to implement this business rule, it was discovered 
that only a small number of facilities had any entries with the BMAR fund code. Because 
a value of zero for backlogged maintenance receives the maximum utility, as shown in 
Appendix A, this error would have seriously overestimated the contribution of 
backlogged maintenance to site desirability and potentially biased the outcome of the 
decision model significantly. This problem was solved, after discussions with USARC, 


by totaling all the unfunded requests without concern for the fund code. 


B. QUERY PERFORMANCE PROBLEMS 


The user did not care how long the evaluation took as long as it was an automated 
process. The ARIES SDSS project requirements offered little insight about the expected 
computer execution time a site evaluation would take. During the beginning phases of 
prototype development, query times in excess of one and half hours were common. This 
long evaluation time, though still many times faster than the current manual process, was 
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considered unacceptable for an automated decision support implementation by the 
development team. Furthermore, the source files in use during the beginning phase only 
contained data for the state of Pennsylvania. Extending the data sets to the national level 
promised some truly staggering execution times. Efforts were undertaken immediately to 
streamline the lengthy query by focusing on two areas of the querying process: (1) the 
use of geo-queries in place of standard SQL queries and, (2) aggregation of detailed 


information into smaller data sets. 


| SQL vs. Geo-Query 


After reviewing each measure in detail, it was determined that 14 of the 20 
decision measures were dependent on a spatial element query (e.g., Number of Reservists 
within 50 miles). Because the application was using a GIS system already, a natural 
course of action was to leverage the powerful geocoding abilities of MapInfo™ to 
conduct spatial queries as a way to reduce the overall query time. The MapInfo™ queries 
executed three to four times faster than conducting the same query through standard SQL. 
Geocoding a source file allows MapInfo™ to localize the desired records and only look at 
a subset of the data file. A counterpart SQL command, on the other hand, would attempt 
to match each record in the data file. The processes of passing the queries to MapInfo™ 


reduced the evaluation time from hours to less than ten minutes. 


Another obvious advantage to using MapInfo™ to conduct other queries 
involving a list of items with a specified area of a geographical location. This feature was 
also used to determine the following lists for use elsewhere in the application: units 
within 50 miles, facilities within 50 miles, Army reservists within 50 miles, members of 
the National Guard within 50 miles, Individual Ready Reserve (IRR) individuals within 


50 miles, and a list of zip codes within 50 miles. 
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2. Detail vs. Aggregation 


One of the factors that led the development team to the incorporation of data 
warehouse concepts in this project was the requirement which surfaced during 
development to improve query performance. As discussed in Chapter III, a data 
warehouse is an optimized data store used to provide data in a structure or format that 
will maximize the performance of the DSS tool set. Aggregating the detailed information 
in source files into summarized tables is one of the techniques use to increase 
performance. The use of aggregation to improve performance was an idea that was used 
heavily in developing the ARIES data resource. Because a number of the queries 
required counting the records that match a particular criterion, the concept of aggregation 


provided an obvious advantage. 


An example of the benefits of aggregation can be seen in the counting of the 
number of individual reservists assigned to a list of units. The original process would 
have conducted a complex query that counted the number of entries in the GI8CWE 
personnel file that had a UIC matching any one of the UICs determined to be in the area 
of the proposed site. The G18CWE data file has in excess of 200,000 records. Matching 
each entry against a list of any substantial number of UICs took in excess of an hour. The 
solution was to add an aggregation query, shown in Table 7, to the data preprocessing 
phase that counted the entries for each distinct UIC and maintained only that total. After 
implementation of this query the application would then only query the aggregate table 
for each UIC on the area list and conduct a simple summation query. This form of 
aggregation was implemented in the extraction process for four of the 17 databases: 
FINANCE, FYxxLOSS, G18CWE, and RPINFODT. Those queries are listed in detail in 


Appendix B. This process moved the query time required for the counting process into 


SELECT UIC, COUNT(UIC) AS UIC_TOTAL 
FROM G18Natl 

ORDER BY UIC 

GROUP BY UIC 

INTO GI18Natl_ UIC 


Table 7. Example Aggregation Query - G18Natl UIC 
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the data preprocessing session and out of each individual evaluation session, reducing site 


evaluation time by about half. 


CG DATA ANOMALIES 


The final and most intricate problem area was the quality of the source data. As 
discussed in Chapter III, the solution a DSS provides 1s only as good as the data on which 
it is based. The quality of the source data files for the ARIES SDSS project provided a 


substantial challenge to the development team. 


Early in the development process, the frequency of data values that were missing 
or null caused multiple error conditions in the applications. A need arose to identify 
missing data values with a default error value so the application did not have to contend 
with null values. A value of “-999” was returned for a decision measure as the default 
error value. Flagging this one error value identified many inconsistencies in the source 
data files. Additionally, the magnitude of the number of default error values that the 
system returned became a concern during initial testing by the user. The initial intent of 
inserting error values was to allow the user an opportunity to enter a subjective value in 
place of the missing value. This subjective value would hypothetically allow the decision 
modeler to provide a better approximation of site desirability. Some runs of the 
application returned so many default error values, however, that the user would have been 
entering more values for decision measures than the automated application returned, thus 
defeating the primary purpose of the system. This was deemed an unacceptable 


condition. 


1. Proxy Value Calculations 


It was decided that an interim solution to the number of error values returned was 
to determine a proxy or default value for each measure. These proxy values could be 
substituted automatically in place of the “-999” values to allow the decision model 


evaluation to be conducted without sacrificing authenticity completely. Determining a 
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value that would provide an accurate representation for a given measure required the 
calculation of basic descriptive statistics for each measure (e.g., Mean, Standard 


Deviation, Minimum and Maximum values). 


Descriptive statistics could be calculated for 17 of the 20 decision measures for 
each facility because the business rule did not depend on knowing the Moving Unit. 
These measures were facility oriented or oriented on the area around the proposed facility 
independent of any characteristics relating directly to the moving unit. The three 
measures that could not be calculated in this way because they do depend on moving unit 
characteristics are: Number of Reassignments from the Moving Unit, Available 
individuals with MOSs of interest from Closing Units, and Available IRR individuals 
with MOSs of interest. 


In an attempt to determine a source for the error conditions, as well as calculate 
descriptive statistics, a complete evaluation of all possible sites was conducted. This 
“global” evaluation process allowed the application to be tested to the full extent of the 
data set and assisted the development team to identify potential problems quickly. The 
procedure was conducted twice and required the application to run without interruption in 
excess of a week (using a Pentium 90MHz personal computer). The resulting descriptive 
statistics for 17 of the 20 decision measures are listed in Table 8. Appendix C contains a 


detailed listing of the descriptive statistics and frequency data for each decision measure. 


jig Data Validation 


Because of the enormous amount of missing or null values that the system was 
returning during initial evaluation sessions, 1t became necessary to verify and validate the 
data set in use for the ARIES SDSS prototype. This was required to localize the problem 


and determine if the problem was with the data or in the application implementation. 


Before the data set could be validated, an appropriate range of data values had to 
be determined for each measure. The limits of the ranges were determined using a rule of 


reasonableness to identify values that would adversely affect the evaluation process. 
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Note: Measures 15, 19, and 20 are dependent on the Moving Unit 
Total Number of Facilities (N): 1325 


18. Facility Weekends Used 1,320 


Table 8. Descriptive Statistics for Decision Measures 


Consideration was given to the following areas; (1) the range of values returned during 
the evaluation process, (2) expected values based on the Yield Curves, and (3) common 
sense (i.e., a value of zero for Facility Age was not considered reasonable). The valid 


ranges for each measure are listed on Table 9. 


As indicated in Table 9, major problems with at least six of the 20 decision 
measures were identified. The validation was conducted on the list of 1523 available 
facilities. The review identified a serious problem with files which contained missing 
values; in some cases the files were missing as much as 57% of the values (Area Drill 
Attendance). A detailed review of the data files in question determined that the files did 
not contain information for any state other than Pennsylvania. This problem was a result 
of the user requirements from a prototype for Pennsylvania data to a national data set. 


This change occurred during the development process without the data files being 
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Table 9. ARIES Measures Analysis Statistics - Run #1 


updated to match the new national search requirements. Databases containing nationwide 


information were used to conduct a second analysis and evaluation session as discussed 


later in Table 10. 


A problem was also identified when using a ratio as the measuring metric. A ratio 
does not reflect the magnitude of the underlying values used to obtain that ratio. In the 
case of the value for Area Drill Attendance, for example, the application returned a value 
of 0.8 for a facility, which would be considered within the expected range. The value for 
Area Drill Attendance involves calculating the ratio of reservists meeting satisfactory 
drilling requirements to the total number of reservists required to drill at a facility. The 
application was constructed to store the interim values of DRILL_SAT and 


DRILL TOTAL, which were calculated to be 4 and five 5 respectively, resulting in a 


49 


ratio of 0.8. However, these numbers did not match the total number of reservists 
actually assigned which was 2496. This wide disparity was traced back to the same 
problem leading to 57% of the facilities having missing values for Area Drill Attendance. 
This problem was resolved when the FINANCE file was updated to reflect entries on a 
national basis. However, it is still possible for a measure represented as a ratio to hide 
potential data problems. A good strategy in this case is to display the basic values that 
comprise the ratio in addition to the ratio itself. Five of the decision measures in the 
ARIES SDSS project use ratios: Area Drill Attendance, Area Loss Rate, Area Transfer 
Rate, Average Area Manning, and Reassignments. A detailed description for the 


calculation process for each measure 1s shown in Appendix A. 


One potentially deceptive measure for which values calculated during the analysis 
may provide false feedback is Area Available Closing Units. This measure is used to 
determine the number of available reservists from units that are scheduled to close in the 
area of a proposed site. Because the number of closing units is small in comparison to 
the number of active units, not all facilities will have units scheduled for closing within 
50 miles. However, the application will return a default error value for this case that 
should not be considered an error. In this case a value of zero could be the valid answer. 
Because it 1s difficult to distinguish between missing data or the fact that there may be no 
closing units, the application currently does not compensate for this situation and the 


correction is left to the user. 


A second analysis session was conducted on a list of facilities known to be active 
Army Reserve facilities in the continental United States. The orginal list of 1523 
facilities was pared down by a total of 198 sites to a total of 1325 by removing facilities 
in remote locations, facilities marked as not existing, and facilities marked as temporary. 
Appendix C contains the frequency data and descriptive statistics for each measure that 
was produced by this second analysis run. Table 10 shows the validity analysis statistics 


for each measure. 
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Table 10. ARIES Measures Analysis Statistics - Run #2 


Five of the measures in this run fall below a 90% potentially valid level. Three of 
these measures (Facility Backlogged Maintenance, Facility Operating Cost, and Facility 
Age) are facility related and are the result of missing data or, in the case of facility 


operating cost, values of Zero. 


The missing values for Area Available Closing Unit, as discussed above, are a 
result of proposed sites not having any closing units within the geographic area. In this 
case it would be acceptable for a site to return a null value for this measure. The values 
for the Distance to ECS measure are a result of the Army Reserve having only 30 ECS 
sites nationwide. As a result some sites will have a distance of greater than 500 miles to 
the nearest ECS site. This “out of range” distance value will be interpreted by the 


decision model and assigned the minimum utility value based on the shape of the yield 
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curve. Hence, these invalid range problems for these two variables will not adversely 


affect the decision model. 


Missing or null values in the data files are identified in the application by a default 
error value which can be changed easily by the user in order not to affect adversely the 
outcome of the evaluation. The values that are out of range, on the other hand, can have a 
direct, negative impact on an evaluation if they are not identified and corrected. Consider 
Facility Operating Cost measure as an example. Of the 18.3% of the values that were 
found to be out of range, all but two were equal to zero. This is a concern in the 
evaluation process because a value of zero will receive the maximum utility during the 
decision analysis phase. It is true that a closed facility or a proposed facility would not 
have a current value for the operating costs but in the case of an active facility this value 
should be something greater than zero. Currently this situation requires the 
knowledgeable user to intervene and apply a reasonable value. This is a particularly 
insidious example of how incomplete or inaccurate values in source files can filter 


through the data warehouse into the DSS. 


RP Data Quality Analysis 


The data analysis conducted for the SDSS application was only concerned with 
determining data validity. A complete analysis would consider all the characteristics of 
data quality discussed in Chapter III including accuracy, completeness, consistency, 
timeliness, and uniqueness as well as validity. Below we present some examples of data 
problems encountered during the development phase, which relate to these other data 


quality characteristics. 
Accuracy 


Problem: (Facility Condition - Measure # 4) 
Every value for facility condition is “GREEN”. It is unreasonable 
to expect that no facilities would be coded either AMBER or RED, 
therefore it seems quite unlikely that these values match the actual 
condition of each facility. 
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Solution: 


Completeness 


Problem: 


Solution: 


Problem: 


Solution: 


Consistency 


Problem: 


Solution: 


Problem: 


Solution: 


Problem: 


Solution: 


These values must be updated by the owner/stakeholder of the 
operational data file. 


(Facility Age - Measure # 3) 

The source data file is missing 42% of the facilities deemed to be 
valid from the GEOREF file. 

These values must be updated by the owner/stakeholder of the 
operational data file. 


GEOREF File 

Zip codes are missing on 96 of the facilities listed in GEOREF, 20 
of which are marked as valid facilities. 

These values must be updated by the owner/stakeholder of the 
operational data file. The zip code is used to geocode these 
facilities for possible selection as a proposed site. 


Zip Codes are of varying length in all the data files. Some files 
contain the nine digit zip codes while others only maintain five 
digit codes. 


In order to query On zip codes and obtain an exact match, the use 
of five digit zip codes was adopted. This was done during the 
extraction process by only loading the first five digits of a zip code 
from all the files. 


UICs are not represented uniformly in all the source data files. 
Some files use a UIC designation that does not include the letter 
designating an active unit. Other data files use a parent UIC 
instead of the UIC of an actual unit. (e.g., unit structure is against 
the parent UIC, .....AA, whereas the person assigned to a billet is 
assigned at the platoon level UIC, .....A1) 


Ensured all UICs were six digits in length. 


Data entries for facilities and units in source files do not match the 
list of valid units. 


Not corrected. Action required by the owners of the source data 
files. 
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Timeliness 


Problem: (Area Loss Rate, Area Transfer Rate, Measures # 8 & 9) 
FYxxLOSS File used to determine loss and transfer rate can be as 
much as twelve months out of date. 


Solution: No current solution. 
Uniqueness 
Problem: The data files do not have unique indexes. The lack of a unique 


list of facilities and units has allowed entries in source data files for 
sites that do not exist. 


Solution: No current solution. 


These examples are not exhaustive and are only intended to be representative of the 
problems that exist in the source data files for the ARIES SDSS project. Further analysis 
is required in the area of the other five data quality characteristics to determine the overall 
level of data quality that is present in the final application. Because the SDSS is based on 
data files that are updated on a frequent basis, there is a need to monitor the quality level 


of the data following each future extraction. 


D. CHAPTER SUMMARY 


As with any prototype DSS development project, substantial problem areas arose 
with respect to data quality issues. Three major categories of data quality problems were 
identified: business rule development, unacceptable query performance, and data 
anomalies. Though these problems provided a considerable challenge to the development 


team, acceptable solutions have been implemented in most cases. 


The documentation of the business rules provides the building blocks of the SDSS 
application. In the ARIES application, problems with the business rules manifested as 
logic errors or lack of data support. Solutions to these problems can be developed for 
both types of business rule errors. Unfortunately, these errors usually show up during the 
initial phases of the application development and must be resolved to allow the 


application to continue with development. 
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Because the SDSS application is query intensive, the performance level of the 
application will depend on the ability to carry out each query in the most efficient 
manner. The ARIES application required performance enhancements in two areas: geo- 
queries for spatially related queries and preprocessed data aggregation. These relatively 
simple solutions provided a performance enhancement that reduced the evaluation time 


for each site by an order of twenty fold, from hours to a matter of minutes. 


The validity and quality of the source data directly affects the quality of the 
resulting evaluation in the ARIES SDSS application. An understanding of the data 
problems involved in the ARIES project did not become a concern until the application 
was in full development. Because of the magnitude of the problems, steps were initially 
taken to localize the source of problems. Through a detailed validation analysis, a large 
portion of the data anomalies was corrected. Data problems were handled either at the 
preprocessing stage with filtering queries or else corrected at the source file. The 
correction of these anomalies on a post facto basis consumed a major amount of time that 
detracted from development. This opportunity cost of undertaking remediating action 


could be reduced significantly with prior planning. 


The problems that the ARIES SDSS prototype application encountered were in 
most cases not anticipated. The need to reduce the impact of these problem areas on the 
application development process requires changes in the initial steps of the SDSS 
development process. Chapter V discusses the lessons learned from this initial prototype 
development and proposes several requirements that should be added to the development 
process in order to identify solutions for these problem areas earlier in the project life 


cycle. 


2. 





V. LESSONS LEARNED: SDSS DESIGN AND DEVELOPMENT 


The ARIES SDSS project prototype is in final implementation, but it did not get 
to that point without encountering a number of major problems that could have been 
avoided with prior knowledge. The lessons learned from this project will be valuable to 


the development of second-generation SDSS projects. 


The majority of the problems with the ARIES project centered on the wide 
disparity between the initial system concept and the final product. The original goal of 
the project was to develop a decision model for the TPU readiness issue. However, the 
final product is a fully functional automated decision support tool. As the ARIES project 
grew with each new functionality the development process itself received less and less 
attention. Writing and testing code became the focus. While this is a common scenario 
for prototype applications, serious problems can arise when a decision is made to take the 
prototype to full implementation with little planning, which is what occurred in this 


situation. 


This chapter discusses recommended changes to the SDSS development process 
to avoid the pitfalls that hampered the development of the original ARIES prototype 
application. These changes include the addition of a data migration plan, refinements to 


the decision model, data model, and system design, and future considerations. 


A. SDSS DEVELOPMENT PROCESS 


The ARIES SDSS began as a project to elicit an expert system decision model for 
placement of Army Reserve TPU units using a GIS to display locational decision factors. 
As the users generated additional requirements, the project evolved into an integrated 
application data resident model involving the use of a GIS application and a DSS 
application. Because the original scope of the project centered on the accuracy of the 
decision model, the retrieval of data and data quality issues supporting that model were 


not in the original considerations. 
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This complacency about the underlying data mirrors a recurring theme in the data 
warehouse development literature. Joe Celko and Jackie McDonald indicate the likely 
consequences of such oversights. 

“Ignore or trivialize problems with the existing data at the start of the 

project, and that oversight will brutally assert itself when data problems 


begin to surface as you populate the warehouse from outside data sources, 
current applications, and legacy data.” [Ref. 19:p. 1] 


Unfortunately, this statement was particularly appropriate in the case of the ARIES 
prototype application. Overlooking the migration of the source data to the application was 
a major oversight. This error has led us to recommend a revision of the SDSS 
development process that 1s documented in Chapter II (Figure 8). The major difference 


from the original process 1s the addition of a data migration plan (DMP) discussed below. 
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Figure 8. Recommended SDSS Development Process 
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B. DATA MIGRATION PLAN 


As the ARIES project developed, the data model evolved from merely querying 
source files into a specialized form of a data warehouse. The importance of this transition 
was not well identified or understood by the ARIES development team at the time. 
Because the underlying data have a direct effect on each phase of the development 
process (decision model, data model, and system design), the data resource became a 


critical factor to the success and completion of the project. 


A well thought out quality migration plan for source data can ease the labor of 
application development. [Ref. 20:p. 1] As a crucial piece that integrates the application, 
the migration plan is responsible for transforming, transporting, and scrubbing the data. 


This requires substantial prior planning. 


1. Designate Migration Team 


The most important action to be taken in developing a data migration plan is to 
identify the group of individuals that will be responsible and take ownership for the data 
portion of the project. A team of individuals should be named early in the project 
development that will be responsible for making the required data available to the 
application. These team members should consist of both business and technical 


individuals from the application agency. 


Members of the migration team will be responsible for gathering source files, 
analyzing those files, and maintaining the meta-data/business information about each file 
and its elements. This information about the source files 1s important to the process of 
mapping data elements to the individual decision measures. The understanding that can 
be gained by documenting the business knowledge underlying each data element and 
each data file will allow the development team to implement the business rules with a 
minimum of disruption. As each source file 1s analyzed and identified for use in the 
application, a process should be initiated for maintaining the business information about 


each file. This should include at a minimum the information that can be documented on 
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the Source File Documentation Form in Appendix D. The process of documenting the 
source files should be the responsibility of the business members of the migration team 


who are intimate with the source files. 


The migration team will be responsible for the structure and architecture of the 
centralized data resource file that is based on the intended use of the data in the 
application and the structure of the source files. A migration strategy will be developed 
that includes determining a method of migration to maximize the information available to 


the decision support tool. 


A migration strategy is achieved by determining what type of information will be 
maintained and at what level of detail. The term granularity is used in data warehousing 
to identify the level of detail that is contained in the data warehouse. The granularity is 
determined by either maintaining detailed data elements as they are found in the source 
files or by summarizing those elements to reduce the granularity. As discussed in earlier 
chapters summarizing or aggregating data in the data resource file can improve 
dramatically the performance of certain queries. Determining this strategy early in the 
project life cycle provides the system design team the advantage of a stable data source 


for a most of the overall project. 


Once the design of the decision model is complete, the migration team can begin 
to assist in the development of the data model by assisting in the mapping of data 
elements to decision factors and in developing the business rules. The logic for the 
eventual data extraction process the team will use comes directly from these business 


rules. 


ae Determine Extraction Logic and Generate Extraction Routine 


By being involved in the mapping of the decision measures to actual data 
elements, the migration team can begin the process of designing the logic comprising the 
data extraction process. The migration team will be required to identify the data elements 


required in order to reduce the amount of data that 1s migrated to the data resource file. 
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The logic involved in the extraction process can implement criteria that are in effect for 
any of the decision measures. An example of this logic would be filtering only the active 
reservists from the transaction files by identifying a specific code that is used to mark 


each active reservist. 


The business rules developed in the data model can be used to identify common 
elements of aggregation and structure that are required in the data resource file. 
Extraction routines or queries can be developed to retrieve, transform, and summarize 
data into a format that will optimize the usability of the data resource file by the 
application. The extraction routines can also be used to determine and set the indexed 
fields of the data resource that allow the application’s querying tool to access the stored 


data in an efficient manner. 


Data scrubbing and cleansing for common data errors should also be implemented 
in the extraction routines. The data cleansing and conditioning rules are developed from 


the data quality assessments discussed in the next section. 


3. Quality Assured Data 


As discussed in Chapters III and IV, data quality is an important issue that must 
be investigated very early in a project. The migration team will be directly involved in 
assessing the quality level of data available as well as that required by the proposed 
application. This portion of the migration process can do more to ensure the success of 
the project than any other. The first and most important task that must be undertaken 
pertaining to data quality is the Data Quality Baseline Assessment. Which will be used to 
identify problems with accuracy and inconsistencies of the source data that can be 
integrated into the data preprocessing routines. The baseline will also provide the 
customer and the development team with a deeper understanding of the data that are 


intended for use in the project. 


In order for the migration team to be able to perform an analysis, they must first 


develop a set of quality metrics. This metrics should include, at a minimum, an 
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evaluation of the source files and their ability to meet the six characteristics of data 
quality outlined in Chapter III. By completing this initial assessment, the migration team 
will be able to identify potential problem areas and make a determination about the 


legitimacy of the data to support the decision process. 


In conjunction with the customer, the migration team must determine and 
document the expectations for the quality of data. This expectation level should be based 
on the level of risk the customer is willing to accept if the resulting decision is wrong 
because of the underlying data. Meta-data and data quality metrics will be used to 
document these expectations and will be used on a continued basis to measure the 


performance of the data as it pertains to the expected level of data quality. 


The data errors that result from comparing the source data with the data quality 
metrics will produce additional logic to be included in the extraction phase. This process, 
known as data conditioning or data cleansing will filter out potential data quality 
problems from the source files and thereby improve the data quality in the data resource 
file. This process will not necessarily correct the source of those problems, as quality 
problems can only be fully corrected in the source files. The migration team is 
responsible for identifying and correcting known data quality problems at their source 


wherever possible. 


C. DECISION MODEL DEVELOPMENT 


A decision model is the foundation upon which the entire SDSS application will 
be built. In the SDSS concept, a known decision process is automated by accessing 
available database information. The reliability and trust placed on the outcome of the 
SDSS must be based on the quality of the underlying database information. Ken Orr 
states in his discussion of Data Quality and Systems Theory that the quality of data is a 
function of its use. [Ref. 14:p. 9] In the ARIES SDSS project many of the problems 
encountered during the development process were a result of the fact the data had not 


been used for the purpose of making site location decisions. Future applications should 
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attempt to leverage the data that is being used in current business processes to maximize 


the inherent quality factors arising from frequent use. 


During the development of a decision model for a new application, care should be 
taken to document the associated data that is currently being used for decision-making. 
Identifying these sources will assist the migration team in their efforts to gather and 
present source data of the highest quality. In the case of the ARIES SDSS application, 
the source files were being used by a wide community of different users at USARC 
primarily for operational purposes as opposed to decision support applications. As a 
result, the data files were never obliged to meet the stringent quality standards required of 


DSS applications. 


D. DATA MODEL 


Problems with the underlying data will continue to be a problem in any 
application that attempts to leverage information stored in legacy databases. The goal in 
the future will be to minimize the effect these problems have on the development process. 
Listed below are three areas that can smooth the transition of data into the SDSS 


application and limit the impact on the application process. 


1. Data Standardization 


The elements of source data must be standardized when the legacy data files being 
used in an application are not constructed under the same set of business specifications. 
These inconsistencies in rules and definitions may lead to problems when the actual data 
are being interpreted out of the original context in which it was defined. Data 


standardization is achieved by logically identifying, grouping, and classifying data. 


This lack of standards for the data attributes in existing applications can manifest 
in many ways. For example, data elements in different files may not label the same field 
in the same manner. Fields with the same name may not contain the same information 


because of differences in usage by different customers. Examples from the ARIES SDSS 
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project are the data fields that represent the Unit Identification Codes which were labeled 
differently in all of the following manners: UIC1, UIC, CURR _VIC, OWN UIC. 
Furthermore, the UIC field was used to identify the same billet in different ways. The 
source file that lists the actual billets or jobs at a reserve facility was marked with the use 
of a parent UIC that identified the facility itself. The individuals actually assigned to a 
billet in the personnel file, on the other hand, are listed with a UIC that identifies the 
actual platoon to which the individual is assigned. Such inconsistencies make it difficult 
to verify one file against another. In this case it would be impossible to identify if there 
were a specific individual assigned for every billet in the billet structure file or whether 


the billet to which an individual is assigned is valid. 


The purpose of data standards is to facilitate common use and understanding in 
identifying data characteristics. All parties involved in the project must be able to 
interpret the same information in exactly the same way. This will allow the development 
to be consistent and remove the need for each individual to have a deep understanding of 


each data file. 


Data standardization must be conducted during the extraction process. The rules 
associated with the standardization specifications will be implemented in the extraction 
routines developed by the migration team. These specifications become part of the 
transformation process and the application can then be developed without concern about 


knowledge of the individual structure of source files. 


pap Meta-data Documentation Process 


Understanding the information about the source databases, i.e., the meta-data, will 
allow the application to maximize the use of information in the data files. There were 
many times when development of the application stalled while the development team 
waited for insight about one or more source data files to be provided by the customer. 
Meta-data has two parts; (1) the detailed information about the data elements, their 


formats, length and so on, and (2) the business information and understanding about the 
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data file. The business information documents the rules involved in populating the data 


file, what the data file represents, and the criteria for each data element. 


An important part of this documentation process is identifying the ownership of a 
source file and maintaining a knowledgeable point of contact for each file. These 
individuals will be responsible for documenting and maintaining the meta-data 


throughout the lifecycle of the project. 


Michael Brackett describes the need for a meta-data warehouse that goes beyond 
the traditional data information storage repositories and provides a “personal help desk” 
for increasing the awareness and understanding of the data resource.[Ref. 3:p. 193] This 
concept would allow the user to access indexed information about the source file and 
therefore maximize his/her ability to identify what data are available. Using meta-data to 
the fullest extent possible would benefit the SDSS concept by allowing the decision- 
maker to tap the maximum amount of knowledge available in the source data files. The 
intent here is not create a meta-data documentation project but to provide the maximum 
amount of available information concerning the data files to the development team. This 
tedious and time consuming project will pay dividends in the quality of the output from 


the application. 


3. Identify Spatial Aspects of Queries 


“Over 80% of business data have some spatial context such as a customer address, 
ZIP Code, or location.” [Ref. 5:p. 1] Taking advantage of the spatial aspects of the 
underlying data can provide valuable information to the decision process as well as 
enhance the performance of the final product. Identifying the decision measures and their 
associated business rules that rely on a spatial aspect can be used to create a performance 


advantage. 


During the ARIES SDSS project, complex queries that involved determining if 
one entry in a table existed in another table were found to have definite spatial aspects. 


The ARIES application realized a twenty-fold increase in query performance by simply 
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allowing the GIS application to conduct that portion of the query related to spatial 


parameters. 


Using the advantages that a GIS system provides to localize data will allow the 
SDSS application to access larger quantities of data in a shorter amount of time. The 
ability of a SDSS application to access large quantities of data efficiently will allow 
incremental improvement of the underlying decision process. This spatial component 
may also allow the decision-maker to introduce new decision measures that can enhance 


the final outcome of the SDSS. 


E. SYSTEM DESIGN 


Lessons learned from the system design portion of the ARIES project are 
discussed in a thesis being prepared concurrently by LT Peter Falk. As the principal 
designer of the UI application, his thesis provides a detailed discussion of the issues and 


challenges required to complete the ARIES SDSS prototype application. 


F. FUTURE CONSIDERATIONS 


The concept of an SDSS application is still evolving. The ARIES prototype 
application has proved the viability of an asset that integrates a GIS system and DSS tools 
to leverage the knowledge maintained in legacy databases for decision-making purposes. 
Enhancements to be incorporated in methodologies used in future SDSS applications can 
be separated into the phases of the development process; Decision Model, Data 


Migration, Data Model, System Design, and Testing. 


1. Decision Model 


Discussions of improvements to the decision model have been discussed fully in 


Reference 1 and Reference 2. 
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Des Data Migration 


The migration process implemented in the ARIES prototype only supports the 
application in its current configuration. The migration application, Aries Administrator, 
does not allow for additional data files or queries. This limitation will hinder the ability 
of the Administrator to support the ARIES application if any portion of the decision 
hierarchy is changed. Consideration should be given to adapting the Administrator 
application to allow the ability to add and remove files and queries from the system. The 
Administrator currently only allows for a complete extraction of every data file associated 
with the application, a time consuming event that is not necessary if every data file has 
not changed. A situation may arise where only one data file has changed; in this case, the 
Administrator application should be adapted to allow the user to conduct an intelligent 
update by choosing the data files that require updating. By documenting the update 
frequency of a data file in the meta-data (i.e., weekly, monthly, etc.), the Administrator 


could identify data files that have not been updated. 


Based on the importance of the quality of data migrated to the data resource file, it 
will be important for future implementations of the SDSS methodology to have an 
automated method for determining and maintaining data quality. The migration process 
is the phase in which the rules associated with the quality metrics can be used to 
determine the quality of the underlying process. By automating this process, exceptions 
can be generated during the migration process that will identify known problems. The 
migration engine would be able to provide an estimation of the quality of data and 


determine whether that level is acceptable based on the expectations provided by the user. 


3. Data Model 


Because the idea of a useful decision support tool involves the ability to be 
flexible as the decision process changes, an automated application such as ARIES must 
be able to adapt to that changing environment. There should be a system in place that 


will identify changes to any part of the application, (i.e., legacy data, decision model, data 
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quality requirements, etc.) and capture the effects of those changes on the application. 
For example, if a legacy database changes in any way the migration process should be 
adjusted to reflect those changes as well. Also, if a new decision measure is added, the 
data and meta-data to support that measure should be added to the data resource file. 
Flexibility of the application will be the key to its longevity. If the application cannot be 
updated easily as the business process or environment changes it will die a certain and 


swift death. 


Data values that do not change over long periods of time and are used to support 
decision measures should be calculated only as those values change and not during each 
evaluation session. Values are calculated during each session for measures such as 
Distance to Recruit Station and Distance to ECS that do not change on a frequent basis. 
This calculation process could be moved from the evaluation process into the data 
migration process by computing a value that is pre-calculated for all possible sites. Early 
identification of values that change infrequently will reduce the overhead required in the 


system design portion of the project. 


4. System Design 


The ARIES SDSS prototype application instituted the use of an error value to help 
identify data that were missing or returned null (1.e., -999). This was effective for 
identifying a number of potential problems and provided valuable, albeit limited, 
feedback from the system directly to the user. However, this concept must be expanded 
to include other error codes for a more detailed feedback system. The error codes should 
kept to a minimal list of highly useful codes. For example, other error codes should be 
used to identify a value of zero for a measure that should not be zero (i.e., Facility Age). 
Another example would be additional error values that can signify different types of 
problems resulting from the calculations, e.g., if Number Assigned returned for a unit is 
null or zero and a value is found for Losses or Transfers for the same unit, then an error 
exists in one of the files. Currently the ARIES application assigns a zero to the value of 


the calculation to avoid a division by zero. Adding an error value to identify 
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inconsistencies between data files would provide the user with a possible reason for the 
values of Loss Rate or Transfer Rate being zero. This type of error detection would 
require the application to have intelligent business rules that document the relationships 
between decision measures. This same process could also be used to identify possible 


problems with ratio values discussed in Chapter IV. 


Other considerations were made to allow the user the ability to choose a default 
value for measures that returned with error values. The user could choose to use a 
previously determined value such as the mean. This would allow the decision model to 


include this measure in the evaluation of the site. 


5. Testing 


The implementation and testing phase will continuously monitor and evaluate the 
results of the system and provide feedback to the application. The application must have 
the ability to monitor and evaluate the results and provide feedback to the system to 
improve the process. During the ARIES project, a problem like this was identified in the 
post implementation phase. Every value the system returned for Facility Condition, 
Measure #4, was the same value, GREEN. This resulted in each site receiving the 
maximum utility for that measure and effectively nullifying any benefits that measure 
provided to the decision model. A problem like this could be identified by observing the 
trends of the answers and having the testing module send a flag to the user that reports a 
problem with a decision measure. The application would provide some basic information 
to assist in diagnosing the problem. A full testing of the application requires a formal 
feedback mechanism that will allow problems to be documented as well as the solutions 


and corrections to be documented. 


G, CHAPTER SUMMARY 


An inspection of the final ARIES SDSS prototype application can provide future 


implementations of the SDSS methodology with valuable information. The oversights, 
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problems, and mistakes discovered during the design and development of this “proof of 
concept” application has led to recommended changes to the development process. 
Among the changes are the addition of a DMP and minor refinements to the decision 


model, data model, and system design phases. 


The need for a DMP was unfortunately realized too late in the ARIES project. 
Many of the stumbling blocks in the development could have been avoided had there 
been an integrated plan for the movement of data from its source to the DSS application. 
A detailed DMP will involve assigning responsible individuals to gather data, develop 
extraction logic, and generate extraction routines or queries. This process will provide a 
stable source of data as a foundation for the application. The migration team will also be 
responsible for evaluating the baseline quality of the data set and generating a plan to 


reach the desired level of quality in the final product. 


Development of the decision model drives the entire SDSS development process 
and should be given the proper amount of attention. The failure of the ARIES project to 
identify sources receiving frequent use required the development team to spend valuable 
time validating and correcting the source data files. It is important to identify in the 


decision elicitation process the data elements that are currently being used in a system. 


Because the data files used by the ARIES project were made up of a collection of 
large legacy data files from varying sources, there was a need for all members of the team 
to have the same understanding of what comprised these files. A system to maintain 
detailed information about the data files, ic. meta-data, was required. Detailed 
documentation should accompany the transfer of a source file from the customer to the 
development team. Standardizing the common data elements in the data resource file 
provides the application with future flexibility. The ARIES project was able to leverage 
the spatial aspects inherent the underlying data that were directly associated with decision 
measures. Future SDSS implementations should make every attempt to harness the 


spatial aspects of the data. 
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As the SDSS methodology is used in future implementations, the need to add 
increased flexibility and feedback to the user will continue to enhance the usability of the 
final product. Future implementations will concentrate more attention on the quality of 
data and ability of the increased intelligence in the system to provide the user with 


highest quality decision support tool available. 


ia 





Vi. CONCLUSION 


A. SUMMARY 


Developing a Spatial Decision Support System (SDSS) for the Army Reserve 
TPU relocation decision problem provided insight into new methods to improve the 
development methodology for a SDSS. The Army Reserve Installation Evaluation 
System (ARIES) is the result of using this SDSS methodology to integrate a detailed 
decision model in an automated DSS. The system integrated two commercial software 
programs, Logical Decisions for Windows™ as a decision model solver and MapInfo™ 
as GIS mapping engine. A user interface (UI), created in Visual Basic™, served as an 


integration tool for retrieving data and passing information to between these components. 


The system architecture developed for the ARIES project consisted of a decision 
model, a data model, and an integrating application. The decision model was developed 
under separate research and constituted the basis for gathering the required data to 
evaluate the readiness of an Army Reserve facility. The decision measures developed in 
the decision model generated a set of business rules that were mapped to actual data 
elements. Because of the complexity of the queries, the business rules and the quantity of 
data that was involved, the development team identified a need for a centralized data 
resource file. The ARIES data resource file used data warehousing techniques to conduct 
extractions from the many source data files, and was optimized for the ARIES decision 
process. A data preprocessing application was created to generate this data resource file. 
The ARIES Administrator is a Visual Basic™ application that acts as a migration engine 


to transform the source data into the structure required by the ARIES application. 


Because of the spatial nature of the decision model involved, the ARIES data 
resource file used basic data warehousing techniques, such as aggregation and 
summarization, to take advantage of the spatial attributes of the source data. This 
spatially enabled data set is a special form of a data warehouse or data mart called a 


spatial data warehouse. The spatial aspects of the data were used in conjunction with the 
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GIS application to maximize query performance. The primary advantage of using 
geocoded (i.e., spatially identified) information in the queries was a significant increase 


in performance for the ARIES application. 


Through the use of a spatial data warehouse, the SDSS is buttressed with a stable 
data source engineered to provide the underlying decision model with the highest quality 
data in a timely manner. The integration of a data migration plan (DMP) in the system 
development process ensures that the data resource generated for the application allows 


the SDSS application to generate meaningful outcomes. 


B. CONTRIBUTIONS 


This research implemented the theoretical SDSS methodology by creating an 
integrated application in support of complex site location decisions. As a proof of 
concept application, ARIES demonstrates the ability to integrate a GIS mapping engine 
and a decision model solver in a seamless and flexible environment that allows users to 
leverage operational legacy database information for decision-making purposes. At an 
applied level, this research identified additional requirements necessary during the 
development process to provide SDSS applications with stable and accurate data sources 
of acceptable quality. These additional requirements involved the development of a data 


migration plan (DMP) and the implementation of a data quality assessment plan. 


1. General Contributions 


In addition to the specific benefits afforded to USARC, this project identified 
enhancements to existing SDSS methodology development to ensure data quality. Most 
important is the requirement to transport and transform the underlying data into a format 
that allows the SDSS application to access that data in the most efficient manner. The 
DMP that is outlined in this paper provides the basis for a data resource to instantiate a 
decision model in a fashion that improves performance and assures a confidence in of the 


outcome. 
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Data quality was identified as a limiting factor of the SDSS application too fully 
analyze the site evaluation as well as provide an outcome that is credible to the user. This 
identified the need to incorporate the evaluation and assessment of source file data quality 
as a continuing effort throughout the development process. An important element in 
correcting and maintaining an expected level of quality for data is the assignment of 
individual responsibility for identifying and correcting the inadequacies of the source data 


files. 


2. Specific Contributions to USARC 


The primary benefit of the ARIES project is the use of a very powerful decision 
tool to provide the decision-maker with detailed information previously not available. By 
implementing the detailed and complex queries that provides values for the ARIES 
decision measures, USARC has benefited by being able to analyze this information. The 
ARIES application goes far beyond mere data retrieval, allowing the decision-maker to 
manipulate the results of these complex queries in a highly flexible and fully functional 


decision environment. 


This research showed by detailed analysis that 14 of the 20 decision measures that 
have been automated will return a valid value more than 90% of the time. Further data 
quality analysis would provide the USARC Readiness team with the assurance that the 
ARIES application is basing its outcome on data that are accurate, consistent, complete, 


timely, and unique, as well as valid. 


Through implementation of the Administrator, USARC has benefited from spatial 
data warehousing techniques to improve performance of the system by centralizing the 
data elements required for the TPU relocation decision problem. The Administrator 
provides a stable data set to the ARIES application by using queries that can be repeated 
time after time as the source files change. The Administrator also provides an automated 


data quality filter that facilitates data cleansing of the source data sets. 
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Even without an implementable SDSS application, USARC has received the 
benefit of an in depth look at the data files they are using in their everyday decision- 
making. It has forced the group of experts to verify and validate the assumptions they 


may have made concerning the site location decision problem. 


The real value of this research may lie in the basis it provides for future SDSS 
applications to increase access to decision information directly from legacy data sources. 
Developing a strategy to provide SDSS with high quality data creates a foundation for a 


much higher probability of successful implementation of decision-based systems. 
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APPENDIX A. DECISION MODEL MEASURES 


This appendix contains detailed information about each decision measure that was 


automated in the ARIES SDSS prototype application. The information includes a description of 


each measure, the business rule used to calculate the associated value, base units, source files, 


associated ACROPOLIS tables, query or queries involved in the calculation, a description of the 


yield curve, 


and a graph of the yield curves. “ACROPOLIS” is the file name for the ARIES data 


resource file. 


The term “in the area” in this Appendix is defined as being with in a 50-mile radius of the 


moving unit or proposed facility. 


Measure 1. 
Measure 2. 
Measure 3. 
Measure 4. 
Measure 5. 
Measure 6. 
Measure 7. 
Measure 8. 
Measure 9. 


Measure 10. 
Measure 11. 
Measure 12. 
Measure 13. 
Measure 14. 
Measure 15. 
Measure 16. 
Measure 17. 
Measure 18. 
Measure 19. 
Measure 20. 
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Definition: 


Calculation: 


Units: 


Source File: 


Measure 1. Facility Backlogged Maintenance 
Facility Backlogged Maintenance provides the total dollar value of backlogged 
maintenance. This provides an indication of the initial investment required to 
correct the significant maintenance problems with a proposed facility. 
The Backlogged Maintenance value is based upon the sum values for maintenance 
actions documented for each facility in the “CWE TOTAL” field of the 
RPINFODT file. The summation is done during the data extraction phase. 
Maint_Cost[Sum of outstanding maintenance actions for a facility] 


Dollars 


RPINFODT 


ACROPOLIS Table(s): RPINFODT_ 


Query: 


Yield Curve: 


Maint Cost 

SELECT MAINT COST 

FROM ~~ RPINFODT_ 

WHERE RPINFODT_.FACID = ProposedFacility FAC ID 


A linear relationship is assumed between the backlogged maintenance costs and 


utility. Every dollar required or saved in this category is expected to have equal 
utility to a relocating unit. 


Utility 


uN 1,000,000 
Dollars 


Max Utility: 0 Min Utility: > 1,000,000 
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Definition: 


Calculation: 


Units: 


Source File: 


Measure 2. Facility Operating Costs 
Facility Operating Costs provide an indication of the financial resources that are 
required to maintain the facility in a serviceable condition. This includes both 
utilities and minor maintenance costs. 
Operating Costs are extracted from the “COST PR_SF” field of the FPS file. 
COST _PR_SF[Retrieve the Cost per Square Foot for a facility] 


Dollars per square foot per month 


FPS 


ACROPOLIS Table(s); FPS_ 


Query: 


Yield Curve: 


COST _PR_SF 
SELECT COST PR SF 

FROM FPS_ | 

WHERE FPS_.FACID = ProposedFacility. FAC_ID 


A linear relationship is assumed between the operating costs and utility. Every 
dollar required or saved in this category is expected to have equal utility to a 
relocating unit. 


Utility 


100 
Dollars/square foot per month 


Max Utility: 0 Min Utility: > 100 
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Measure 3. Facility Age 

Definition: | This measure indicates the age of the primary structure on the proposed relocation 
site. It is intended to reflect an assumed long term structural degradation with 
time. 

Calculation: Facility age is calculated based upon the acquisition date found in the INTEREST 
file. The acquisition date is compared to the current date and the difference is 
determined in months. 

DATE ACQ{[Current Year - Date Acquired] 

Units: Months 

Source File: INTEREST 

ACROPOLIS Table(s): INTEREST _ 

Query: DATE ACQ 
SELECT DATE ACQ 
FROM — INTEREST _ 

WHERE INTEREST_.FACID = ProposedFacility.FAC_ID 


Yield Curve: A linear relationship is used between facility age and utility. 


Utility 


0 1,200 
Months 


Max Utility: 0 Min Utility: > 1,200 
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Definition: 


Calculation: 


Units: 


Source File: 


Measure 4. Facility Condition 


Facility Condition is based upon a visual inspection of the structure and provides 
an indication of the serviceability of the primary structures. 


This measure is based upon the ISR part | rating entered in the “FAC COND” 
field of the FPS file. 


FAC COND[Retrieve Facility Condition] 
No Units(Green, Amber, Red) 


FPS 


ACROPOLIS Table(s): FPS __ 


Query: 


Yield Curve: 


FAC COND 

SELECT FAC COND 

FROM ~— FPS __ 

WHERE FPS_.FACID = ProposedFacility. FAC_ID 


The utility of these three categories varies in discrete steps. A facility that 1s 
categorized as “green” is judged to be approximately twice a desirable as one that 
is assigned an “amber” rating. 


1.0 


Utility 0.5 





RED AMBER GREEN 


Max Utility: GREEN Min Utility: RED 
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Definition: 


Calculation: 


Units: 


Source Data: 


Measure 5. Facility Ownership 


This measure indicates whether the facilities at a proposed relocation site are 
leased or owned. 


Facility Ownership is based upon the entry in the “GOVT_OWN?” field of the 
COMBEEX file. 


GOVT _OWN{Retrieve Ownership Status] 
No Units(Yes/No) 


COMPLEX 


ACROPOLIS Table(s); COMPLEX _ 


Query: 


Yield Curve: 


GOVT_OWN 

SELECT GOVT_OWN 

FROM COMPLEX __ 

WHERE COMPLEX_.FACID = ProposedFacility.FAC ID 


Facilities that are owned by the government are preferred as relocation sites over 
those facilities that are leased. The owned sites are assigned the maximum utility 
value of 1.0, while leased sites are given a 0 utility score. 


Utility |. 


NO YES 


Max Utility: Yes Min Utility: No 
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Definition: 


Calculation: 


Units: 


Source File: 


Measure 6. Competition 


This measure provides an indication of the level of competition for potential 
reservists. It considers only Army Reserve and Army National Guard units in the 


area of the relocation site. 


Competition 1s determined by the number of positions that must be filled by all 
other Army Reserve and Army National Guard (ARNG) units in the area of the 
proposed relocation site. For Army Reserve units, the number of required 
positions is determined by counting the number of records in the G19TRUE file 
associated with each UIC in the area. For ARNG units, the value is found in the 


“AUTH?” field of the NGNON_CL file. 


NO_AUTH_NG[Number Authorized National Guard] + 
NO_REQD[Number Area Reservists Required] 


Number of competing positions 


COMMAND PLAN, G17, GI9TRUE, GEOREF, NGNON_CL 


ACROPOLIS Table(s): CMDPLAN, G17Natl, G19Natl, VALID _UIC 


Query: 


Area-FACID List(MapInfo) 

SELECT FAC _IDINTO TempFACID 
FROM  GEOREF 

WHERE Object Within ObjAreaBuffer 
ORDER BY FAC [ID 

(Note: ObjAreaBuffer is equal to 300 miles) 


VALID UIC 

SELECT UIC, FAC_ID, UnitName, City, State, Zip 

FROM ~ GI7Natl 

WHERE) G17Natl.UIC = ANY (SELECT CMDPLAN.UIC 
FROM CMDPLAN) 


Area-UIC List 

SELECT DISTINCT VIC INTO AREA UIC 

FROM VALID UIC 

WHERE VALID UIC.FAC ID = ANY (SELECT AREA_FACID.FAC_ID 
FROM AREA FACID) 


NO_AUTH_NG(MapInfo) 

SELECT * INTO TempNGUnits 
FROM NON_CLOS 

WHERE Obj Within ObjAreaBuffer 
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Yield Curve: 


SELECT SUM(AUTH) “No _AUTH_NG” INTO Strength 
FROM  TempNGUnits 


NO_REQD 

SELECT SUM(UIC TOTAL) AS TOTAL REQD 

FROM — G19Natl 

WHERE G19Natl.UIC = ANY (SELECT AREA. _UIC.UIC 
FROM AREA UIC) 


A linear relationship exists between the number of competing positions from other 


units and the utility of a relocation site. The level of no site utility in this measure 
begins at 10,000 positions which 1s above the maximum value expected. 


1 


Utility 


Competing Positions Tree 


Max Utility: 0 Min Utility: > 10,000 
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Definition: 


Calculation: 


Units: 


Source File: 


Measure 7. Average Area Drill Attendance 


This measure indicates the fraction of reservists with satisfactory drill attendance 
for all existing units in the area of the proposed relocation site. Areas with a high 
fraction of satisfactory drill attendance are preferred relocation sites because units 


relocated to that area are assumed to perform similarly in drill attendance. 


This measure considers the last four quarters of data contained in the FINANCE 
file. After initial screening, the number of reservist with 21 or more drill periods 


for the year is divided by the total number of people who meet the screening. 


DRILL_SAT [Number of reservists with > 21 drill periods in a year] 
DRILL_TOTAL [Number of reservists required to drill] 


Ratio 


COMMAND PLAN, FINANCE, G17, GI9TRUE, GEOREF 


ACROPOLIS Table(s): CMDPLAN, FINANCE_, FINANCE QTR, G17Natl, G19Natl, 


Query: 


VALID UIC 


Area-FACID List(MapInfo) 

SELECT FAC IDINTO TempFACID 
FROM GEOREF 

WHERE Object Within ObjAreaBuffer 
ORDER BY FAC ID 

(Note: ObjAreaBuffer is equal to 300 miles) 


VALID_ VIC 

SELECT UIC, FAC_ID, UnitName, City, State, Zip 

FROM ~~ GI17Natl 

WHERE G17Natl.UIC = ANY (SELECT CMDPLAN.UIC 
FROM CMDPLAN) 


Area-UIC List 

SELECT DISTINCT UIC INTO AREA UIC 

FROM  VALID_UIC 

WHERE VALID UIC.FAC_ID=ANY (SELECT AREA FACID.FAC_ID 
FROM AREA FACID) 


Z| 


Yield Curve: 


FINANCE CY 
SELECT UIC, COUNT(UIC) AS UIC_TOTAL INTO FINANCE CY 
FROM FINANCE QTR 
WHERE (Select Case) 
Case 1“ Qtr FY 
(UTAIQIPF + UTA2Q1PF + UTA3QIPF + UTA4Q1PF) > 20 
Case 2™ Qtr FY 
(UTA2Q1PF + UTA3QIPF + UTA4QIPF + UTAIQCFY) > 20 
Case 3" Qtr FY 
(UTA3Q1PF + UTA4QIPF + UTAIQCFY + UTA2QCFY) > 20 
Case 4" Qtr FY 
(UTA4QIPF + UTAIQCFY + UTA2QCFY + UTA3QCFY) > 20 
GROUP BY UIC 
ORDER BY UIC 


DRILL-SAT 

SELECT SUM(UIC_TOTAL) AS TOTAL SAT 

FROM FINANCE CY 

WHERE FINANCE _CY.UIC = ANY (SELECT AREA _UIC.UIC 
FROM AREA UIC) 


DRILL-TOTAL 

SELECT SUM(UIC_TOTAL) AS DRILL_TOTAL 

FROM  FINANCE_ 

WHERE FINANCE_.UIC = ANY (SELECT AREA_UIC.UIC 
FROM AREA _UIC.UIC) 


The utility of the average drill attendance rate increases linearly between the 
values of 0 and 0.6. Above that point, increases in the attendance rate result in 
diminishing returns. Values above 0.6 become increasingly uncommon. 


Utility 


1 
Satisfactory Attendance / Total Reservists 


Max Utility: 1.0 Min Utility: 0.0 
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Definition: 


Calculation: 


Units: 


Source File: 


Measure 8. Area Loss Rate 


This measure indicates the fraction of reservists who left the reserves in the 
previous fiscal year, for all existing units in the area of the proposed relocation 
site. Areas with a low loss rate are preferred relocation sites because units 


relocated to that area will also experience low loss rates. 


The number of losses to units in the area in the previous fiscal year is divided by 
the number of reservists currently assigned to these units. Losses are identified 
through the transfer mnemonic field (TRMN=”LOSS”) of the FyxxLOSS file. 
The number of assigned reservists is determined by counting all of the personnel 


records in the G18CWE file associated with each UIC in the area. 


NO_LOSSJ[Total Number of Losses in the last year] 
NO_ASSN[Total Number Reservists Assigned] 


Ratio 


COMMAND PLAN, FYxxLOSS, G17, GI8CWE, GEOREF 


ACROPOLIS Table(s): CMDPLAN, FYxxLOSS, G17Natl, G18Natl UIC, VALID_UIC 


Query: 


Area-FACID List(MapInfo) 

SELECT FAC_IDINTO TempFACID 
FROM GEOREF 

WHERE Object Within ObjAreaBuffer 
ORDER BY FAC ID 

(Note: ObjAreaBuffer is equal to 300 miles) 


VALID UIC 

SELECT UIC, FAC _ID, UnitName, City, State, Zip 

FROM  GI17Natl 

WHERE G17Natl.UIC = ANY (SELECT CMDPLAN.UIC 
FROM CMDPLAN) 


Area-UIC List 

SELECT DISTINCT VIC INTO AREA UIC 

FROM  VALID_UIC 

WHERE VALID _UIC.FAC_ID = ANY (SELECT AREA_FACID.FAC_ID 
FROM AREA FACID) 


NO_ ASSN 

SELECT SUM(UIC TOTAL) AS TOTAL_ASSN 

FROM  G18Natl_UIC 

WHERE G18Natl UIC.UIC = ANY (SELECT AREA_UIC.UIC 
FROM AREA UIC) 
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Yield Curve: 


NO_LOSS 

SELECT SUM(UIC TOTAL) AS TOTAL_LOSS 

FROM  FYxxLOSS 

WHERE FYxxLOSS.UIC = ANY (SELECT AREA _UIC.UIC 
FROM AREA UIC) 


This function includes both concave and convex regions. The inflection point 
occurs at a loss rate of .33 and a utility of 0.5. Based on experience, a loss rate of 
one third per year was considered to be typical. Any loss rate below this value 


has relatively high utility, whereas loss rates above the inflection point quickly 
approach a utility of zero. 


Utility 


Losses / Total Number of Reservists 


Max Utility: 0 Min Utility: 1 
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Definition: 


Calculation: 


Units: 


Source File: 


Measure 9. Area Transfer Rate 


This measure indicates the fraction of reservists who transferred to different units 
in the previous fiscal year for all existing units in the area of the proposed 
relocation site. Areas with a low transfer rate are preferred relocation sites 


because units relocated to that area will also experience low transfer rates. 


The number of transfers in the previous fiscal year is divided by the number of 
reservists currently assigned to the unit. Transfers are identified through the 
transfer mnemonic field (TRMN="TRFD’) of the FyxxLOSS file. The number of 
assigned reservists 1s determined by counting all of the personnel records in the 


GI8CWE file associated with each VIC. 


NO_XFER[Total Number of Transfers in the last year] 
NO_ASSN[Total Number Reservists Assigned] 


Ratio 


COMMAND PLAN, FYxxLOSS, G17, GI8C WE, GEOREF 


ACROPOLIS Table(s): CMDPLAN, G17Natl, G18Natl UIC, FYxxXFER, VALID UIC 


Query: 


Area-FACID List(MapInfo) 

SELECT FAC _IDINTO TempFACID 
FROM GEOREF 

WHERE Object Within ObjAreaBuffer 
ORDER BY FAC ID 

(Note: ObjAreaBuffer is equal to 300 miles) 


VALID UIC 

SELECT UIC, FAC _ID, UnitName, City, State, Zip 

FROM ~~ GI17Natl 

WHERE G17Natl.UIC = ANY (SELECT CMDPLAN.UIC 
FROM CMDPLAN) 


Area-UIC List 

SELECT DISTINCT UIC INTO AREA UIC 

FROM VALID VIC 

WHERE VALID _UIC.FAC_ID=ANY (SELECT AREA _FACID.FAC ID 
FROM AREA _ FACID) 
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Yield Curve: 


NO_ ASSN 

SELECT SUM(UIC TOTAL) AS TOTAL_ ASSN 

FROM  G18Natl_UIC 

WHERE G18Natl UIC.UIC = ANY (SELECT AREA UIC.UIC 
FROM AREA UIC) 


NO_XFER 

SELECT SUM(UIC_TOTAL) AS TOTAL XFER 

FROM  FYxxXFER 

WHERE FYxxXFER.UIC = ANY (SELECT AREA _UIC.UIC 
FROM AREA UIC) 


This function includes both concave and convex regions. The inflection point 
occurs at a loss rate of .33 and a utility of 0.5. Based on experience, a transfer rate 
of one third per year was considered to be typical. Any loss rate below this value 
has relatively high utility (close to 1.0), whereas loss rates above the inflection 


point quickly approach a utility of zero. 


Utility 


: Number of Transfers / Total Number Assigned 


Max Utility: 0 Min Utility: 1 
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Definition: 


Calculation: 


Units: 


Source File: 


Measure 10. Area Average Manning 


This measure indicates the ability to fill the required positions. An average value 
is determined for all existing units in the area of the proposed relocation site. 
Areas with high average manning levels are preferred relocation sites because 


units relocated to that area will also experience high manning levels. 


The number of reservists assigned to area units (based upon the number of 
personnel records in GI8CWE file associated with each UIC) is divided by the 
number of required positions (based upon the number of positions in the 
G19TRUE file associated with each UIC). An average is calculated for all UIC’s 


in the area of the proposed site. 


NO_ASSN[Total Number Reservists Assigned] 
NO_REQD[Number Area Reservists Required] 


Ratio 


COMMAND PLAN, G17, GI8CWE, GI9YTRUE, GEOREF 


ACROPOLIS Table(s): CMDPLAN, G17Natl, G18Natl_UIC, G19Natl, VALID_UIC 


Query: 


Area-FACID List(MapInfo) 

SELECT FAC IDINTO TempFACID 
FROM GEOREF 

WHERE Object Within ObjAreaBuffer 
ORDER BY FAC _ID 

(Note: ObjAreaBuffer is equal to 300 miles) 


VALID _ UIC 

SELECT UIC, FAC_ID, UnitName, City, State, Zip 

FROM  GI17Natl 

WHERE G17Natl.UIC = ANY (SELECT CMDPLAN.UIC 
FROM CMDPLAN) 


Area-UIC List 

SELECT DISTINCT UIC INTO AREA UIC 

FROM  VALID_UIC 

WHERE VALID UIC.FAC_ID=ANY (SELECT AREA _FACID.FAC_ID 
FROM AREA FACID) 
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Yield Curve: 


NO ASSN 

SELECT SUM(UIC_TOTAL) AS TOTAL ASSN 

FROM  G18Natl_UIC 

WHERE G18Natl_UIC.UIC = ANY (SELECT AREA. _UIC.UIC 
FROM AREA UIC) 


NO REQD 
SELECT SUM(UIC_TOTAL) AS TOTAL REQD 
FROM ~ G19Nat! 


WHERE G19Natl UIC = ANY (SELECT AREA _UIC.UIC 
FROM AREA UIC) 


It is desirable that area units be able to exceed their minimum manning 
requirements. All manning levels above 125% are considered to have maximum 
utility. Manning levels below this value drop off quickly in terms of utility. 


Utility 


Assigned reservists / Required reservists 


Max Utility: 1.25 Min Utility: 0 
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Definition: 


Calculation: 


Units: 


Source Data: 


Measure I1. Distance to Nearest Recruit Station 


Distance to the nearest Recruiting Station provides one indication of recruiter 
effectiveness. 


The straight-line distance from the proposed site to the closest recruiting station is 
calculated using a geocoded version of the RZA file. 


DIST_RZA[Determine distance to nearest Recruit Station] 
Miles 


RZA 


ACROPOLIS Table(s): NONE 


Query: 


Yield Curve: 


DIST _RZA(MapInfo) 

SeeberT* 

FROM = RZA 

WHERE Obj Withing ObjDistanceBuffer into TempRZA 
(Note: ObjDistanceBuffer is equal to 300 miles) 


SELECT Distance((CentroidX(Obj), Centroid Y(Obj), FacIDLat, FacIDLong, “mi’’) 
FROM  TempRZA 
ORDER BY Distance INTO TempRZA.Dist 


The effectiveness of a recruiting station in filling positions at a reserve unit is 
fairly high if the two are within a half hour drive of each other. It is assumed that 
recruiters are most effective in the area close to their recruiting station and that 
reserve recruits must be located near the unit with which they will serve. A 
distance of 30 miles is assigned an average utility of 0.5. A small change in 
distance results in less change in desirability when the distance is very small or 
very large than it does when the distance is around 30 miles. 


1 


Utility 


Miles 


Max Utility: 0 Min Utility: > 100 
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Definition: 


Calculation: 


Measure 12. Available Transfers from Closing Units 


This value indicates the total number of personnel assigned to closing units within 


50 miles of the proposed site. 


A list of Unit Identification Codes (UIC’s) 1s created which contains only those 
units scheduled to close within 18 months. These units are identified by an entry 
of 5B in the “Tier” field of the G17 file. The number of potential transfers from 
closing units is calculated by summing the number of records in the GI8CWE 
database for the closing units which are located in the area of the proposed 


relocation site. 


TOTAL _AVAIL[Total Number of Available Reservists from Area Closing Units] 


Units: 


Source File: 


Ratio 


COMMAND PLAN, G17, GI8CWE, GEOREF, US_ZIPS(MapInfo) 


~ACROPOLIS Table(s): CMDPLAN, G17Natl, VALID_UIC 


Query: 


Area-FACID List(MapInfo) 

SELECT FAC JIDINTO TempFACID 
FROM GEOREF 

WHERE Obj Within objAreaBuffer 
ORDER BY FAC ID 

(Note: objAreaBuffer is equal to 300 miles) 


VALID. UIC 

SELECT UIC, FAC ID, UnitName, City, State, Zip 

FROM  GI17Natl 

WHERE G17Natl.UIC = ANY (SELECT CMDPLAN.UIC 
FROM CMDPLAN) 


Area-UIC List 

SELECT DISTINCT UIC INTO AREA UVIC 

FROM VALID _UVIC 

WHERE VALID UIC.FAC_ID = ANY (SELECT AREA FACID.FAC ID 
FROM AREA FACID) 


AREA CLOS_UIC 
SELECT UIC 
FROM  G17Natl 
WHERE G17Natl.TIER = “5B” 
AND G17Natl.UIC = ANY (SELECT AREA_UIC.UIC 
FROM AREA UIC) 
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Yield Curve: 


Max Utility: 
Min Utility: 


AREA ZIPCODE(MapInfo) 
SELECT ZIPECODE 

FROM ~~ US _ ZIPS 

WHERE Obj Within objAreaBuffer 
ORDER BY ZIP CODE 


Area G18 ZIP(MapInfo) 

SELECT DISTINCT UIC, ZIPCODE, COUNT(UIC) AS UIC_ TOTAL 
FROM GI18CWE 

GROUP BY UIC, ZIPCODE 

ORDER BY UIC, ZIPCODE 


TOTAL_AVAIL 
SELECT SUM(UIC_TOTAL) AS TOTAL AVAIL 
FROM ~ Area G18 ZIP 
WHERE Area_G18 ZIP.UIC = ANY (SELECT AREA CLOS_UIC.UIC 
FROM AREA CLOS_ UIC) 
AND Area_G18_ ZIP.ZIPCODE = ANY (SELECT AREA_ZIPCODE.ZIP 
FROM AREA _ ZIPCODE) 


The shape of this function assumes diminishing returns in the number of transfers 
available. Experience suggests that for an average unit of 100 people, 
approximately half have prior reserve experience and that approximately half of 
the people in a closing unit will be able to transfer their skills directly to a new 
unit. The value of the first 100 reservists increases at a nearly linear rate because 
they provide preferred fills for approximately 50 of the positions of the moving 
unit. A value of 100 personnel is assigned a utility of 0.9. The incremental value 
added by each additional person over 100 continues to drop until no marginal gain 
is expected over 500. 


] 


Utility 


People 250 


50 
0 
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Definition: 


Calculation: 


Units: 
Source File: 


Query: 


Yield Curve: 


Measure 13. IRR Available 


Individual Ready Reserve (IRR) Available is the number of IRR members living 
in the area of the proposed relocation site. This is a measure of the size of the 
prior service market. 


A geographical query returns the total number of IRR members living within a 
specified distance of the proposed relocation site. This process requires a 
geocoded version of the IRR file. 


TOTAL_IRR[Total Number of Available IRR from the Area] 
People 
IRR 


AreaIRR(MaplInfo) 
SebeC) s71Pe ZIP’, LEFTS(@MOS, 3) “MOS” 
FROM IRR 
WHERE Obj Within objAreaBuffer 

AND ZIPC <= “” AND PMOS <= “” 
ORDER BY ZIPC 
(Note: objAreaBuffer is equal to 300 miles) 


TOTAL IRR 
SELECT COUNT(*) AS TOTAL IRR 
FROM ArealRR 


For a typical unit of 100 people, it is assumed that approximately 40 positions 
could best be filled by IRR members. The recruiting rate for the IRR is 
approximately 1 percent, so an area that offers 4,000 IRR members is assigned an 
average utility of 0.5. Above this point, there are diminishing returns. The 
market begins to exceed the personnel demand of a moving unit and limited 
recruiting efforts become marginally less effective. The utility of smaller 


numbers quickly drops off because of the importance of this source of recruits. 
i 


Utility 


People ee 


Max Utility: > 10,000 Min Utility: 0 
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Definition: 


Calculation: 


Units: 


Source File: 


Measure 14. Recruit Market 


The Recruit Market measure estimates the total number of males who: 

1. live in the area of the proposed relocation site 

2. Would score in the top half on the Armed Forces Qualification Test (AFQT) 
3. Fall into the desired age group (17 - 29 years old) 


This measure sums the entries for all mental categories 1 through 3A, and all 
ethnic groups for the zip codes of interest in the Qualified Military Available 
(QMA) file. The version of QMA used contains only the estimates for males 
within the age range of 17 to 29. 


TOTAL_MARKET|Total Non-Prior Service Personnel from the Area] 


People 


QMA, US. ZIPS(MapInfo) 


ACROPOLIS Table(s): NONE 


Query: 


QMA(MapInfo) 

SELECT LEFTS$(ZIP, 5) “ZIPCODE”, MWCAT12, MWCAT3A, MBCAT12, 
MBCAT3A, MHCAT12, MHCAT3A 

FROM QMA 

WHERE Obj Within objAreaBuffer 

ORDER BY ZIP 

(Note: objAreaBuffer is equal to 300 miles) 


AREA _ZIPCODE(MapInfo) 
SERE@ I~ ZIP CODE 

FROM ~~ US_ZIPS 

WHERE Obj Within objAreaBuffer 
ORDER BY ZIP CODE 


TOTAL. MARKET 
SELECT SUM(MWCAT12+MWCAT3A+MBCAT12+MBCAT3A+ 
MHCAT12+MHCAT3A) AS TOTAL. MARKET 
FROM QMA 
WHERE QMA.ZIP = ANY (SELCET AREA_ZIPCODE.ZIP 
FROM AREA_ZIPCODE) 


105 


Yield Curve: Approximately half of a typical unit of 100 reservists is filled by recruits with no 
prior service. Assuming a recruit rate of 0.25 percent, there must be at least 
20,000 people in the area of the proposed relocation site who meet all of the 
requirements stated above. This value is assigned a typical utility of 0.5. As the 
number increases, there are diminishing returns. The market begins to exceed the 
personnel demand of a moving unit and limited recruiting efforts become 
marginally less effective. 


Utility 


0 250,000 
People ; 


Max Utility: > 250,000 Min Utility: 0 
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Definition: 


Calculation: 


TOTAL _RESERVISTS[Total Number of Available Reservists from the Moving Unit] 


Units: 


Source File: 


Measure 15. Reassignments 


The Reassignments measure indicates the fraction of the reservists assigned to the 
moving unit who currently live within a specified distance (50 miles) of the 


proposed relocation site 


This measure is calculated by first determining all zip codes that lie within a 
specified distance of the proposed relocation site (based upon zip code centroid) 
and then identifying all reservists who both live within one of the identified zip 
codes (based upon the “ZIP” field of the GI8CWE file) and are assigned to the 
moving unit (based upon the “UIC” field of the GI8CWE file). Then the number 
available reassignments is divided by the total number of reservists assigned to 


the moving unit. 


UIC_TOTAL[Total Number of Reservists Assigned Moving Unit] 
Ratio 


G18CWE, US_ZIPS(MapInfo) 


ACROPOLIS Table(s): G18Natl 


Query: 


AREA_ZIPCODE(MapInfo) 
SELECT ZIP CODE 

FROM ~~ US_ZIPS 

WHERE Obj Within objAreaBufter 
ORDER BY ZIP CODE 


G18(Mapinfo) 

SELECT UIC, LEFTS(ZIP,5) “ZIPCODE”, PRI “MOS” 
FROM GI8CWE 

WHERE Obj Within objG18Buffer AND PRI = *”’ 
ORDER BY UIC, ZIP 

INTO G18 


Area G18 ZIP 

SELECT DISTINCT UIC, ZIPCODE, COUNT(UIC) AS UIC_TOTAL 
FROM G18 

GROUP BY “UlCyZIFCODE 

ORDER BY UIC, ZIPCODE 
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Yield Curve: 


TOTAL RESERVISTS 
SELECT SUM(UIC_TOTAL) AS TOTAL RESERVISTS 
FROM Area_G18_ ZIP 
WHERE Area G18 ZIP.UIC = MovingUnit.UIC 
AND Area_G18_ ZIP.ZIPCODE = ANY (SELECT AREA ZIPCODE.ZIP 
FROM AREA ZIPCODE) 


UIC_TOTAL 

SELECT UIC_TOTAL 

FROM ~ GI18Natl 

WHERE G18Nat!.UIC = MovingUnit.UIC 


The current location will always receive a utility score of 0.0 on this measure. For 
relatively close relocation sites, this function was made to be convex, assigning 
high utility values to alternatives that are close to the current location. 


Utility 
0 Potential reassignments / J 
Total number of reservists 
Max Utility: 1.0 Min Utility: 0.0 
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Measure 16. Distance to Area Maintenance Support Activity 


Definition: 


Calculation: 


Units: 


Source Data: 


Distance to the nearest Area Maintenance Support Activity (AMSA) is calculated 
as a proxy measure for response time and support quality. 


The straight-line distance from the proposed site to the closest AMSA is 
calculated using a geocoded version of the AMSA file. 


DIST_AMSA[Determine distance to nearest AMSA Site] 
Miles 


AMSA 


ACROPOLIS Table(s):; NONE 


Query: 


Yield Curve: 


DIST_AMSA(MapInfo) 

SERECT * 

FROM AMSA 

WHERE Obj Withing ObjDistanceBuffer into TempRZA 
(Note: ObjDistanceBuffer is equal to 300 miles) 


SELECT Distance((CentroidX(Obj), Centroid Y(Obj), FacIDLat, FacIDLong, “mi”) 
FROM  TempAMSA 
ORDER BY Distance INTO TempAMSA Dist 


The desirability of a relocation site is relatively insensitive to small changes in 
distance for both close and distant AMSA sites. Little degradation in service is 
expected if the AMSA can have parts and technicians on site within a couple 
hours using a car or truck. It is possible that a trainer that breaks down in the 
morning may be operational for an afternoon training session. At approximately 
200 miles (assigned a 0.5 utility) it starts to become impractical to expect same 
day service and avoid an overnight stay. Eventually it becomes necessary to 
consider flying rather than driving which is likely to further reduce the 
responsiveness and effectiveness of the AMSA. 


1 


Utility 
Miles 500 
Max Utility: 0 Min Utility: > 500 
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Measure 17. Distance to Nearest Equipment Concentration Site 


Definition: 


Calculation: 


Units: 


Source Data: 


Distance to the nearest Equipment Concentration Site (ECS) provides an 
indication of the training time that must be used to travel back and forth. 


The straight-line distance from the proposed site to the closest ECS is calculated 
using a geocoded version of the ECS file. 


DIST _ECS[Determine distance to nearest ECS] 
Miles 


ECS 


ACROPOLIS Table(s): NONE 


Query: 


Yield Curve: 


DIST_ECS(MapInfo) 

SEBEGCT * 

FROM ECS 

WHERE Obj Withing ObjDistanceBuffer into TempECS 
(Note: ObjDistanceBuffer 1s equal to 300 miles) 


SELECT Distance((CentroidX(Obj), Centroid Y(Obj), FacIDLat, FacIDLong, “mi’”’) 
FROM  TempECS 
ORDER BY Distance INTO TempECS.Dist 


The desirability of an Equipment Concentration Site is relatively insensitive to 
small changes in distance for both close and distant sites. Typically, a site that 
can be reached within an hour and ten minutes is not significantly less desirable 
than one that can be reached in ten minutes. An hour of one-way travel time is 
not normally considered to be excessive and allows for most of the time to be 
spent training on a one day training exercise. At approximately 60 miles 
(assigned a 0.5 utility) it starts to become impractical to expect useful training to 
be conducted on a day trip and avoid an overnight stay. Eventually it becomes 
necessary to consider flying rather than driving which is likely to further reduce 
the desirability of the ECS. 


Utility 


0 : 200 
M iles 


Max Utility: 0 Min Utility: > 200 
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Measure 18. Facility Weekends Used 


Definition: | Facility Weekend Usage provides the number of weekends per month that the 
facility is currently in use. This measure treats a facility as a limited resource that 
is incrementally depleted as more units are assigned. Since most units require 
exclusive use of the facility one weekend every month, the number of weekends 
used normally corresponds to the number of units assigned and is typically limited 
to four. 


Calculation: This value is extracted from the “RS WKND_ PM” field of the COMPLEX file. 
WKND_USEDj[Retrieve Number Weekends Facility Used per Month] 

Units: Weekends per month 

Source Data: COMPLEX 

ACROPOLIS Table(s): COMPLEX __ 

Query: WKND_USED 
SELECT COMPLEX _.FAC_WKND_ USED 
FROM COMPLEX _ 
WHERE COMPLEX .FAC ID = ProposedFacility.FAC ID 

Yield Curve: Although some exceptions exist, a typical facility offers no utility to a relocating 
unit if all four weekends are already being used. Although most facilities with 
three units or less should be able to accommodate a new unit and might be viewed 
as having equal utility, other issues such as full time administrative space and 


available equipment storage space make a facility with fewer units currently 
assigned slightly more desirable. 


Util 


0 Weekends / Month 4 


Max Utility: 0 Min Utility: 4 
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Definition: 


Calculation: 


Measure 19. Available MOS from Closing Units 


This measure provides the number of reservists from closing units in the area of 
the proposed relocation site who possess a Military Occupational Specialty 
(MOS) needed by the relocating unit. These people provide a preferred pool of 
trained and qualified recruits. 


The number of personnel records (from the GI8CWE file) that meet all the 
following requirements are counted: 

1. The reservist is assigned to a unit that is scheduled to close (a 
TIER="5B” entry in the G17 file is used to produce a list of closing 
units). 

2. The reservist lives in a zip code in the area of the proposed relocation 
site. 

3. The reservist’s primary MOS is needed by the moving unit. 

If the three MOS groups with the largest number of members in the moving unit 
account for more than 50 percent of the total unit membership, then only those 
three MOS’s are considered. Otherwise all MOS’s required by the moving unit 
are considered as an MOS of interest. 


TOTAL CLOS MOS[Total Number of Available Reservists from Area Closing Units with 


Units: 


Source File: 


MOS’s of Interest] 
Number of people 


COMMAND PLAND, G17, G18CWE, GEOREF, US_ZIPS(MapInfo) 


ACROPOLIS Table(s): CMDPLAN, G17Natl, G18Natl, VALID UIC 


Query: 


Area-FACID List(MapInfo) 

SELECT FAC ID INTO TempFACID 
FROM GEOREF 

WHERE Obj Within objAreaBuffer 
ORDER BY FAC_ID 

(Note: objAreaBuffer is equal to 300 miles) 


VALID _ UIC 

SELECT UIC, FAC_ID, UnitName, City, State, Zip 

FROM ~~ GI17Natl 

WHERE G17Natl.UIC = ANY (SELECT CMD_PLAN.UIC 
FROM CMD PLAN) 


ss 


Area-UIC List 

SELECT DISTINCT UIC INTO AREA UIC 

FROM  VALID_UIC 

WHERE VALID_UIC.FAC ID = ANY (SELECT AREA FACID.FAC ID 
FROM AREA FACID) 


NoAssnxMOS 

SELECT MOS, COUNT(*) AS MOS COUNT INTO NoAssnxMOS 
FROM -~ GI8Natl 

WHERE G18Natl.UIC = MovingUnit.UIC 

GROUP BY MOS 

ORDER BY COUNT(*) DESC 


MOS_TOTAL 
SELECT SUM(MOS_ COUNT) AS MOS_ TOTAL 
FROM ~~ NoAssnxMOS 


MOS_TOP3 
SELECT TOP 3 MOS_COUNT 
FROM NoAssnxMOS 


MOS_INTEREST 

IF MOS_TOP3/MOS_TOTAL < 50% 
SELECT MOS INTO MOS_INTEREST 
FROM NoAssnxMOS 
ORDER BY MOS 

IF MOS_TOP3/MOS_TOTAL > 50% 
SELECT TOP 3 MOS INTO MOS_INTEREST 
FROM NoAssnxMOS 
ORDER BY MOS 


AREA CLOS UIC 
SELECT UIC 
FROM  GI17Natl 
WHERE G17Natl. TIER = “5B” 
AND G17Natl.UIC = ANY (SELECT AREA _UIC.UIC 
FROM AREA UIC) 


AREA ZIPCODE(MapInfo) 
SELECT ZIP CODE AS ZIP 
FROM ~~ US_ZIPS 

WHERE Obj Within objAreaBuffer 
ORDERBY ZIP CODE 
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Yield Curve: 


G18(MapInfo) 

Saree T UIC EF TS(ZIP,5) ZIPCODE”, PRI “MOS” 
FROM GI8CWE 

WHERE Obj Within objG18Buffer AND PRI < “” 
ORDERS Yeuilc, ZIP 

INTO G18 


Area G18 MOS 

SELECT DISTINCT UIC, ZipCode, MOS, COUNT(UIC) AS UIC_TOTAL 
INTO Area G18 MOS 

FROM G18 

GROUP BY UIC, ZipCode, MOS 

ORDER BY UIC, ZipCode, MOS 


Area G18 ZIP 

SELECT DISTINCT UIC, ZIPCODE, COUNT(UIC) AS UIC_TOTAL 
FROM G18 

GROUP BY UIC, ZIPCODE 

ORDER BY UIC, ZIPCODE 


TOTAL_CLOS MOS 
SELECT SUM(UIC_TOTAL) AS TOTAL CLOS MOS 
FROM Area G18 MOS 
WHERE Area G18 MOS.MOS = ANY (SELECT MOS_ INTEREST.MOS 
FROM MOS_INTEREST) 
AND Area_G18_ZIP.UIC = ANY (SELECT AREA _CLOS_UIC.UIC 
FROM AREA CLOS UIC) 
AND Area_G18_ZIP.ZIPCODE = ANY (SELECT AREA_ZIPCODE.ZIP 
FROM AREA_ ZIPCODE) 


The shape of this function assumes diminishing returns on the number of transfers 
available. Experience suggests for an average unit of 100 people, that it is 
unusual to expect more than a third of the members to transfer from closing units 
with the proper MOS. Of the reservists in this category, only half typically 
transfer, so a value of 60 personnel is assigned a utility of 0.9. The incremental 
value added by each additional person over 60 continues to drop until no marginal 
gain 1s expected over 250. 


Utility 


0 People 250 


Max Utility: > 250 Min Utility: 0 
Wy 
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Definition: 


Calculation: 


Measure 20. Available MOS IRR 


This measure provides the number of Individual Ready Reserve members who 
live in the area of the proposed relocation site and who possess a Military 
Occupational Specialty (MOS) needed by the relocating unit. These people 
provide a preferred pool of trained recruits. 


The number of IRR members who possess an MOS needed by the moving unit 
and who live in the area of the proposed relocation site (based upon the zip code 
of their home of record in the IRR file) are counted. Ifthe three MOS groups with 
the largest number of members in the moving unit account for more than 50 
percent of the total unit membership, then only those three MOSs are considered. 
Otherwise all MOSs required by the moving unit are considered as an MOS of 
interest. 


TOTAL IRR MOS[Total Number of Available Reservists from the IRR with MOS’s of 


Units: 


Source File: 


Interest] 
Number of People 


IRR, GI8CWE 


ACROPOLIS Table(s): G18Natl, 


Query: 


NoAssnxMOS 

SELECT MOS, COUNT(*) AS MOS COUNT INTO NoAssnxMOS 
FROM ~~ G18Natl 

WHERE  GI18Natl.UIC = MovingUnit.UIC 

GROUP BY MOS 

ORDER BY COUNT(*) DESC 


MOS_ TOTAL 
SELECT SUM(MOS COUNT) AS MOS_TOTAL 
FROM NoAssnxMOS 


MOS. TOP3 
SELECT TOP 3 MOS COUNT 
FROM — NoAssnxMOS 


MOS_INTEREST 

IF MOS _TOP3/MOS_ TOTAL < 50% 
SELECT MOS INTO MOS_ INTEREST 
FROM NoAssnxMOS 
ORDER BY MOS 


len 


Yield Curve: 


IF MOS _TOP3/MOS_TOTAL > 50% 
SELECT TOP 3 MOS INTO MOS_INTEREST 
FROM NoAssnxMOS 
ORDER BY MOS 


IRR(MapInfo) 

SELECT ZIPC “ZIP”, LEFUR@NIOS,3) ros” 

FROM IRR 

WHERE Obj Within objAreaBuffer and ZIPC <> “”’ AND PMOS <= “” 
ORDER BY ZIPC 

INTO IRR 


TOTAL_IRR_ MOS 

SELECT SUM(UIC_TOTAL) AS TOTAL_CLOS MOS 

FROM IRR 

WHERE _ IRR.MOS = ANY (SELECT MOS. INTEREST.MOS 
FROM MOS_ INTEREST) 


IRR members represent preferred recruits for less than half of the positions of a 
typical moving unit (approximately 40 out of 100) because of issues such as 
seniority and changes in the skills associated with an MOS. The success rate of 
recruiting IRR members is approximately 1 out of 100, so 4000 IRR members in 
the area of the relocation site are required to provide sufficient market to fill the 
40 positions. The value of 4000 is assigned the average utility value of 0.5. As 
the IRR market increases it exceeds the needs of the moving unit and makes the 


limited recruiting efforts marginally less effective. 


| 


Utility 


0 
People a 


Max Utility: > 25,000 Min Utility: 0 
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APPENDIX B. ARIES SOURCE DATA FILE META-DATA 


This appendix contains the meta-data that could be documented for the ARIES 
SDSS project source files. “ACROPOLIS” as used in this appendix refers to the file 
name of the ARIES data resource file. 


Index 
| lvls) Lo Sse 123 
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|| eae Oty L/min or eee 151 
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121 


THIS PAGE LEFT INTENTIONALLY BLANK 


shee 


ARIES Data File Documentation Form 


ARIES File Name: Location: _ ../Aries/MapBasic/USARCData 


File Type: FoxPro 2.6 Size(MB): —.026 No. Records: 190 


Associated ARIES Tables: Not in ACROPOLIS, Geocoded for use in MapInfo 


File Description: 


AMSA File contains information about the location of each AMSA station. It is used in determining the 
value for the distance to the nearest AMSA. 





Required Data Elements 


Data Key 
Name Description Type Format | Field 


[—fac_id | Facility Identification Code | Char | _[No 
fac_title 
fac_street 
fac city 
fac_state C 

fac_ zip 
abbype [SSS «dha 











O1o;/O1o];/O7° 





Extract Queries: 


NONE 
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ARIES Data File Documentation Form 


|} ARIES File Name: COMMAND PLAN Location: | ACROPOLIS 


File Type: FoxPro 2.6 Size(MB): 3.29 No. Records: 9,897 


Associated ARIES Tables: CMDPLAN 


File Description: 


Command Plan is the file that contains information about each unit in the Army Reserve. It is used to 
cross reference FAC ID's with UIC's. It is also used to screen for Valid UIC's with in the next 13 months. 


Required Data Elements 


Data Key 
Name Description Type Format | Field 


[UIC ‘| Unit Identification Code | Char [| —*d|—iyes 
FACID Facility Identification Code | Char | | no | 
EDATE Effective Date of Transaction 


Extract Queries: 
CMDPLAN 
SELECT DISTINCT VIC, FACID AS FAC _ID, 
_ EDATE 

FROM COMMANDPLAN 

WHERE (FACID < "N/A") AND (FACID <— 
"TBD") AND (FACID @ "") AND 
(LEN(FACID) > 2) AND 
((LEFT(EDATE,4) = '1998' AND 
MID(EDATE,5,2) <= '02') OR 
(LEFT(EDATE,4) <= '1997')) 

ORDER BY UIC, EDATE DESC 

INTO CMDPLAN 

INDEX ON UIC as UIC 

Note: Application automatically adjusts the 
dates to obtain a 13 month window. 
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ARIES Data File Documentation Form 










i ARIES File Name: COMPLEX Location: ACROPOLIS 





File Type: FoxPro 2.6 Size(MB): 2.1 No. Records: ios 7 











Associated ARIES Tables: COMPLEX _ 





File Description: 


The Complex File is used to determine if the facility is owned by or leased to the government and the 
number of weekends each facility is used during a month. 







Required Data Elements 


Data | Key 
Name Description Type Format | Field 


FAC ID Facility Identification Code i Ghar "|= «saves | 
GOVT _OWN Facility ownership status | Char | YN | no | 


RS _WKND PM Reserve Station weekend usage per mo. 





Extract Queries: 

COMPLEX _ 

SELECT FAC_ID, GOVT OWN AS 
FAC OWNED, RS_WKND PM AS 
FAC WKND_USED 

FROM COMPLEX 

WHERE LEN(FAC ID)=5 

INTO = COMPLEX _ 


INDEX ON FAC _ID as FACID, Primary, Unique 
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ARIES Data File Documentation Form 





ARIES File Name: ECS Location: ...\MapBasic\USARCData\ 
File Type: FoxPro 2.6 Size(MB):  .004 No. Records: 30 


Associated ARIES Tables: Not in ACROPOLIS, Geocoded for use in MapInfo 


File Description: 


ECS File contains information about the location of each Equipment Center. It is used in determining the 
distance to the nearest ECS. 


Required Data Elements 


Data Key 
Name Description Type Format | Field 


[—fac_id | Facility Identification Code [Char |_| No 
[—factitle | FacilityTile__———~—S~S~S~S Char —«dSSSS*dN 
fac_stree 
fac city 
Tac. Zip 


abb_type 


rol 
ae 
Bal 
ma 
Ey 
= 


Extract Queries: 


NONE 
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ARIES Data File Documentation Form 





—— = 


| ARIES File Name: FINANCE Location: ACROPOLIS 
File Type: FoxPro 2.6 Size(MB): 83.4 No. Records: 311,793 


Associated ARIES Tables: FINANCE , FINANCE QTR 


File Description: 


Finance is the file that contains pay information for the previous eight quarters about every Reservist. It is 
used to obtain information about Drill Attendance for a given Facility. 


Required Data Elements 


Data Key 
Name Description Type Format | Field 


Number 
ee ee 
«Ls sali || TA lg a a 
[| ee | ge ae a 
a 
aa 
Extract Queries: 
FINANCE | FINANCE QTR 
SELECT "W" & LEFT(CURR_UIC,S5) AS UIC, SELECT "W" & LEFT(CURR_UIC,S5) AS UIC, 
COUNT(CURR_ UIC) AS UTAIQCFY, UTA2QCFY, UTA3QCFY, 
UIC_TOTAL UTA4QCFY, UTAI1QIPF, UTA2QI1PF, 
FROM FINANCE UTA3QI1PF, UTA4Q1PF 
WHERE CURR_UIC @&"" FROM FINANCE 
ORDER BY CURR_UIC WHERE CURR_UIC @" AND NPS_IND = NULL 
GROUP BY CURR_UIC AND PAY _ STAT ='A' 
INTO FINANCE _ ORDER BY CURR UIC 


INDEX ON UIC as UIC INTO FINANCE QTR 
INDEX ON UIC as VIC 
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ARIES Data File Documentation Form 









ARIES File Name: FPS Location: ACROPOLIS 


File Type: FoxPro 2.6 Size(MB):  .088 No. Records: 1,561 


Associated ARIES Tables: FPS . 


File Description: 


FPS is used to obtain information about the Cost to operate each facility as well as the Condition of each 
Facility. Used to return a value for the Cost per Square Foot and the Facility Condition. 







Required Data Elements 


Data Key 
Name Description Type Format | Field 


FAC ID Facility Identification Code | Gor ap. |_ No @ 
FAC COND Condition of the Facility loneha ae | Now 
COST _PR_SF Cost per Square Foot to Operate Facility 









© 








Extract Queries: 
FPS _ 
SELECT FAC_ID, FAC_COND, COST_PR_SF 
FROM FPS 
WHERE FAC IDo"™ 
ORDER BY FAC ID 
INTO FPS_ 
INDEX ON FAC _ID as FACID, Primary, Unique 
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ARIES Data File Documentation Form 









FYxxLOSS 


Location: ACROPOLIS 





ARIES File Name: 




















File Type: FoxPro 2.6 Size(MB): 85.4 No. Records: 260,000 


Associated ARIES Tables: FYxxLOSS, FYxxXFER 





File Description: 


FYxxLOSS file contains information about the personnel losses incurred by each unit during a fiscal year. 
It is used to determine the Average Loss and Transfer Rate of a Unit. 





Required Data Elements 
Data Key 

UIC | Unit IdentificationCode | Char | | No_ 
| TRMN | TransferReasonCode CTS Char | | No 
a _ EE ie, eee 2 ee eee 
en in) 
= S|  - eee ee ee ee 
it~. Si fl ey Ll le 
a ee ee Le 
i ee ae ee Pe 
SSeS U | eee eens (eet I eeeryi 
(= a3 | See eee | ees ee a 
a eee ee ee RE ee Pe 
(a, eee ee eae ee 
i ew, | = 






Extract Queries: 


FYxxLOSS FYxxXFER 

SELECT UIC! AS UIC, COUNT(UICI) AS SELECT VICI AS UIC, COUNT(UICI) AS 
UIC_TOTAL UIC_TOTAL 

FROM FY_LOSS FROM FY_LOSS 

WHERE TRMN ='LOSS' WHERE TRMN ='TRFD' 

ORDER BY UIC] ORDER BY  UICI 

GROUP BY UIC] GROUP BY _ UICI 

INTO FYxxLOSS INTO FYxxXFER 


INDEX ON UIC as UIC, Primary, Unique INDEX ON UIC as UIC, Primary, Unique 
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ARIES Data File Documentation Form 





| ARIES File Name: G17 Location: | ACROPOLIS 
| File Type: FoxPro 2.6 Size(MB): 3.11 No. Records: 5,869 


i Associated ARIES Tables: - GI7Natl 


File Description: 


G17 file contains facility Unitname, street address data and Zip Code. It is used as the primary cross 
reference with Command Plan to display facility information and validate user input. 


Required Data Elements 


Data Key 
Name Description Type Format | Field 
I 


[UIC | Unit Identification Code —~~‘| Char {| No 
[UNITNAME [Name ofthe Unit_——~—~+| ~Char_[ [No 
[TCCcrty | City Unitislocatedin——~—~S~wdSCS har «dN 
[No 
[No 
No 
No 


C 
eae IE) © en ae re ae DR om ad, 
a ee ae oe 
|S OO" ee ae eee eee | 
it ae oo Te ck Ger lL ee 


— 
as: 
a 
a. 

[—TYPEORG | Type of organization | Number |__| No 
eee 
ay, 
a, 
ee 


a sont a - 


Extract Queries: 

GI7Natl 

SELECT UIC, UNITNAME, TCCCITY AS 
CITY, TCCSTAT AS STATE, 
LEFT(TCCZIP,5) AS ZIP, TIER 

FROM GI7 

WHERE (RECSTAT <> "I") AND (TYPEORG 
<> "2") AND UIC &"" 

ORDER BY UIC 

INTO  GI17Natl 

INDEX ON UIC as UIC, Primary, Unique 
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| ARIES File Name: 


File Type: FoxPro 2.6 


| Associated ARIES Tables: 


File Description: 


ARIES Data File Documentation Form 


Location: ACROPOLIS;..\MapBasic\UsarcDat 
a 


Size(MB): 145.9 No. Records: 208,416 


G18Natl, GI8Natl_UIC; also Geocoded for use in MapInfo 


G18 File contains information about personnel in the US Army Reserves. It is used in determining the 
Total Number Assigned used in calculating the Loss/Transfer Rates, Total Available Closing and the 
Reassignments values. Also used to obtain a list of the Zip Code’s and MOSs of every Reservists with their 


associated UIC.. 


Extract Queries: 
G18Natl 


Required Data Elements 


Data 
Description Type Format 


Unit Identification Code assigned 
Zip Code of the individual 


Primary MOS 


i 


G18Natl_ UIC 


SELECT UIC, LEFT(ZIP,5) AS ZIPCODE, SELECT UIC, COUNT(UIC) AS VIC_TOTAL 
LEFT(PRI,3) AS MOS FROM -~ G18Natl 


FROM GI18_ 


ORDER BY UIC 


WHERE PRI <> "" AND VIC @&"" GROUP BY UIC 


ORDER BY UIC 
INTO GI18Natl 
INDEX ON UIC as UIC 


INTO G18Natl UIC 
INDEX ON UIC as UIC, Primary, Unique 
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ARIES Data File Documentation Form 












ARIES File Name: 





GI9TRUE Location: ACROPOLIS 













File Type: FoxPro 2.6 Size(MB): 14.4 No. Records: 233,211 


' Associated ARIES Tables: G19Natl 








File Description: 


G19 File contains information about the required manning levels of each Unit. It is used in determining 
Average Area Manning for a Facility. 






Required Data Elements 


Data Key 
Name Description Type Format | Field 


OWN_UIC Unit Identification Code Char | ~«| No | 





Extract Queries: 

G19Natl 

SELECT OWN_UIC AS UIC, 
COUNT(OWN_UIC) AS 
UIC _ TOTAL 

FROM G19 

WHERE OWN _ UIC @&"" 

ORDER BY OWN UIC 


GROUP BY OWN_UIC 
INTO G19Natl 
INDEX ON UIC as VIC 
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ARIES Data File Documentation Form 


ARIES File Name: GEOREF Location: ACROPOLIS;..\MapBasic\UsarcData 
File Type: FoxPro 2.6 Size(MB):  .21 No. Records: a3) 


Associated ARIES Tables: VALID_UNIT; also Geocoded for use in MapInfo 


File Description: 


Georef File contains specific information about each Unit. It is used to verify and cross reference FACID's 
and UIC as well as Facility and Unit specific information. 


Required Data Elements 


Data Key 
Name Description Type Format | Field 


City the Facility is located in 

State the Facility is located in 

Zip Code of the Facility 

Number 


. e 
. . 
— - a “Ar = eee 


a, 


Extract Queries: 

VALID UNIT 

SELECT FAC_ID, FAC_TITLE AS 
UNITNAME, FAC CITY AS CITY, 
FAC STATE AS STATE, 
LEFG@PAC ZIP,5) AS ZIP 

FROM GEOREF 

WHERE FAC ID" 

ORDER BY FAC ID 

INTO VALID UNIT 

INDEX ON FAC _ID as FACID 
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ARIES Data File Documentation Form 






| ARIES File Name: INTEREST Location: | ACROPOLIS 
| File Type: FoxPro 2.6 Size(MB): 4.2 No. Records: 3,985 


| Associated ARIES Tables: INTEREST _ 


File Description: 


Interest File contains information about facilities and the date they were acquired. It is used to calculate 
the Facility Age for each facility.. 


Required Data Elements 


Data Key 
Name Description Type Format | Field 


FAC_IDSTR Facility Identification Code r Gar i | ao 
DATE ACQ Date Facility Acquired | Date | == | No | 
ABB_TYPE 


°) 


Extract Queries: 
INTEREST _ 
SELECT FAC_IDSTR AS FAC_ID, DATE_ACQ 
FROM — INTEREST 


WHERE FAC_IDSTR <@"" AND ABB _ TYPE = 
"USARC (MB)" AND NOT 
ISNULL(DATE_ACQ) 

ORDER BY FAC_IDSTR 

INTO INTEREST_ 

INDEX ON FAC _ID as FACID, Primary, Unique 
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ARIES Data File Documentation Form 





—- | a — —_ 
| ARIES File Name: IRR Location: ...\MapBasic\UsarcData 


| File Type: FoxPro 2.6 Size(MB): 7.5 No. Records: 140,077 
Associated ARIES Tables: Not in ACROPOLIS, Geocoded for use in MapInfo 
File Description: 


IRR File contains information about the individuals listed in the Individual Ready Reserve. It is used to 
determine the value for IRR Available and Available MOS IRR. 


Required Data Elements 


Extract Queries: 


NONE 
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ARIES Data File Documentation Form 







ARIES File Name: NGNON CL Location: ...\MapBasic\UsarcData 





File Type: FoxPro 2.6 Size(MB): .64 No. Records: 3,673 









Associated ARIES Tables: Not in ACROPOLIS, Geocoded for use in MapInfo 





File Description: 


NGNON CL File contains information about the non-closing National Guard Units. It is used in 
determining the value for Competition. 







Required Data Elements 






Data Key 
Ea See ee eC a 
| ZIP si Zip Code for National Guard Individual | Char_ | {Yes 
See. 2 eee eee eee ee 
| See eee eee eee ee 
ee) See eee 
aa a ae Pe eee 
Pe ee ees UD ee 
La “ae ee we OO nde FT 
a LO ah alt = om fh al 
a eee eee ea ea ea 
oe ee eee ee oy cee SS ee 
a eee Te ce ae De 
ha a a eee ee, eee ee 


Extract Queries: 


NONE 
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ARIES Data File Documentation Form 







ARIES File Name: QMA Location: ... MapBasic\UsarcData 










File Type: FoxPro 2.6 Size(MB): 2.8 No. Records: 34,265 


Associated ARIES Tables: Not in ACROPOLIS, Geocoded for use in MapInfo 





File Description: 


QMaA File contains Census information. It is used in determining the value for Recruit Market for each 
Facility. 






Required Data Elements 


Data Key 
Name Description Type Format | Field 


[—zip———‘| ZipCode SSCS har | «dC 





[“MWCATI2 | ‘White Male Mental Categories 1&2 | Namber_ [| “No 
[—MWCATSA | White Male Mental Category 3A | Number |_| “No 
[“MBCATIZ | Black Male Mental Categories | 2 | Number_[ | “No 
[MBCAT3A | Black Male Mental Category 3A | Number [| "No 
[—MHGATI2 | Hispanic Male Mental Categories 12 | Number [|_| No 
Hispanic Male Mental Category 3A | Number | == {No _| 
rrr Tees | 2 LP eee ee. 
a ee er eee heme’ | Pin? of Lameed 
Peo ye a eee ae ee 
ee ee YS eS a ee 2 eee 
Len ee | eee ee ea 
a 2 te eee ee 


: 


Extract Queries: 


NONE 
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ARTES Data File Documentation Form 





| ARIES File Name: RPINFODT Location: ACROPOLIS 
| File Type: FoxPro 2.6 Size(MB): 14.3 No. Records: 


| Associated ARIES Tables: FPINFODT _ 


File Description: 


RPINFODT is a file that contains information about the backlogged maintenance costs of each Facility. It 
is used to determine the amount of backlogged maintenance is required at the given Facility. 


Required Data Elements 


Data Key 
Name Description Type Format | Field 


FAC ID Facility Identification "Cae | = | ae | 
CWE_TOTAL _ | Total amount of outstanding Maint. Actions | Number | == | ~No | 


Extract Queries: 
RPINFODT_ 
SELECT FAC _ ID, SUM(CWE_TOTAL) AS 
MAINT_COST 
FROM RPINFODT 
WHene TAC Ip" 
ORDER BY FAC ID 
GROUP BY FAC _ID 
INTO RPINFODT_ 
INDEX ON FAC _ID as FACID, Primary, Unique 
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ARIES Data File Documentation Form 










ARIES File Name: RZA Location: ... \MapBasic\UsarcData 









File Type: FoxPro 2.6 Size(MB):  .16 No. Records: 1,793 





Associated ARIES Tables: Not in ACROPOLIS, Geocoded for use in MapInfo 





File Description: 


RZA File contains information about the location of Recruit Stations. It is used to determine the distance 
to the nearest Recruit Station. 





Required Data Elements 


Data Key 
Name Description Type Format | Field 


rsid Recruit Station Identification Code Char 
name Recruit Station Title 


en. | 
=a 
zip Zip Code of the Recruit Station aCe eee 
ras 
pees 


a | 


latitude osition of Recruit Station by latitude 
longitude osition of Recruit Station by longitude 





Extract Queries: 


NONE 
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APPENDIX C. ARIES DECISION MEASURE STATISTICS 


This Appendix contains the statistics calculated for the USARC data set. Validity 
and Frequency Statistics were calculated for 17 of the 20 measures that were not 
dependent on knowing the identification of the Moving Unit. 

Individual percentages in the frequency distributions for the 17 measures in this 
appendix are percentages relative to total non-missing values. 

The limits of the ranges for valid values were determined by using a rule of 
reasonableness to identify values that would adversely affect the evaluation process. 
Consideration was given to the following areas; (1) the range of values returned during 
the evaluation process, (2) expected values based on the Yield Curves and (3) common 


sense (i.e., 0 value). 


Index 
PARE ECS CMO CUMMEISIICS ........25....ss0cnecsenessesecccccnssnsseesccessensocconnanecssseessansgduonsabereenscots 159 
PAPO OICASUECS MIAN SIS........0.0000secscccscccccceseccccccecensensscesceaseceensvecssncscsesssessossnensoecesstenens 161 
Measure |” Faekity Backlogeed Maintenance.......................0sseccvssessssssosscossossnnsasovveanens 163 
aeemhe ) Migieility @ peuabiti® COSUS.........1..........se0eessenscecsaccesocssocasecnenntdacdesssstQMQnreeeecsens 165 
BAe race we aN 6 cs face cates cecdeoseycdovesesas ascosecceccessarscnsnnenenesaconsscevesedooevetsccsestecseneeeme 167 
Deals As Monte Te AS INNIS MIGUIBION]...........0 cc. cccsccecececccccccccsecccnsesceccceseseuesesssescescessossesaersccecsneann 169 
Reema te MU MICTOIIND «..0.........ccccgeeeesssessssvennncccssnecnccssnacaaeecsareoncasssvassgeveccraceseceee 171 
Tes ete ie COMMON ETN. |«-.~520-.-0.c0cac0-2+seseenvensiecsacenescnunnesnnseestecds eoneuenecesssdeatammeeeee eee neenenG |e. 
Poeeweeny we VCTAeG ed DTMIOATICMGANCE .................:.cccsesesseoosacsncncccersederrneroensacosceseal 175 
TeESNTOMC (Wie MIMO GaN ey 5,,.. cicccdes.....2dscceeteccass.oeseannisnnsonscescaduueesesss soe sceneccenenslueguuealouians a 
Peewee ne AMSG MMMEIMGRCTUR ALC ..................cccccseeecenesecceccnnccetserccnssecccescuneassessssessnsesseansesen 179 
Measure 10, Ated Average MaWMINg ....................ccccessssonesecccecccnsebansasessseevsssunnnsuassgansees 181 
Measure lil Distance to Nearest’ Recruit Station ....................ccccccs-nspgeeteagaecssoseocsecersesees 183 
Measure 12. Available Transfers from Closing Units 20.0.0... cccesssesssssssssesceeeeeceeeeeees 185 
Sects tine | SMe AiUOMG........210 00M emaanauel asics .c-ss-.0-.00s04e0de0ascndacasesaaneeeneteen maaan team 187 
Pltedetine |4aRicommit Veal. ...........+...+0948ghdeses.+.sscoesesgugdesacseessadegedessas+soo oss aiaaeens:.... amma 189 
Measure 16. Distance to Nearest Area Maintenance Support Activity ...............ccceeeeee 191 
Measure 17. Distance to Nearest Equipment Concentration Site .............cccccseesceeeeeeeees 193 
fiers | Owetdenmiy WeKeNGS USE «......1...........00ccccessdensesscacnerecserseosoesedeccconcecacecteeriens 25 
DRO a NN cs csees PENTA 5.0 nica gantensnn Stan etar eee cccaessunuecenscevcocet 1 leadat enCunte ean tn a eee 197 
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ARIES Descriptive Statistics 
1325 U.S. Army Reserve Facilities 


Observations| Min Max Std 
(N) Value Value lation 
1205 11,979,371| 448,131 
1,251 293.5 3.0865 | 9.7124 
3 Facility Age 1,677 295.1 3.3 
4 Facility Condition 1,251 N/A IA 
5 Facility Owned 1,319 /A 
19 04 







4 
- 
o) 
. 
fe) 
ae 


a 


N/A N 
N/A N 
6 Competition 18 4,116.3 | 3,960.0 
¢? Area Drill Attendance 0.2 

8 Area Loss Rate 0.11 


en) 


9 Area Transfer Rate 0.20 
10 Area Average Manning 0.20 
11 Distance to Recruiter 287.7 
12 Area Avail Closing Unit 114.1 
13 IRR Available 1 658.9 
14 Area Recruit Market Zoom 21 
15 *Reassignments ag. “=e i 
46 DistancetoAMSA—|_—1,325_—«(|_~o | 7,619.9 | 424 | 289.1_ 
47 DistancetoECS | 1.325 | 0 | 5,290.9 | 268.1 | 510.1 | 
18 Facility Weekends Used | 1.300 | o | 3 | 16 | 10 __ 


19 *Avail MOS Closing Units 


| 2 Facility Operating Cost 


5] ° 


f 
~] 
iS) 
CO 
ie) 
Go 
=> 
CO 
CO 
CO 
a aN 
— 
NO 
CO 
O 
i: 











20 *Available MOS IRR 





* Moving Unit Specific Measures 


Observations Std 
(N) Deviation 








lag 
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ARIES Measures Analysis 
1325 U.S. Army Reserve Facilities 


| Out of 
| Range 


a ee ‘ 


1 Facility Backlogged Maint. 










2 Facility Operating Cost 
4 Facility Condition 

5 Facility Owned 

6 Competition 

7 Area Drill Attendance 
8 Area Loss Rate 

9 Area Transfer Rate 

10 Area Average Manning 
11 Distance to Recruiter 
12 Area Available Closing Unit 
13 IRR Available 


14 Area Recruit Market 


15 *Reassignments 

16 Distance to AMSA 

17 Distance to ECS 

18 Facility Weekends Used 


19 *Available MOS Closing Units 


f 
! 


20 *Available MOS IRR 


* Moving Unit Specific Measures 


42.3% 
5.6% 
0.5% 
1.9% 
1.8% 
0.0% 
1.9% 
0.0% 
0.0% 

38.2% 
0.8% 


0.7% 


0.0% 
0.0% 
0.4% 
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0.2% 
0.0% 
0.0% 
0.0% 
0.0% 
2.1% 
1.1% 
2.3% 
0.2% 
0.0% 
0.0% 
0.0% 


0.2% 
10.6% 


0.0% 





Valid 


97.6% 


94.4% 
99.5% 
98.1% 
98.1% 
97.9% 
97.0% 
97.7% 
99.8% 
61.8% 
99.2% 
99.3% 


99.8% 
89.4% 
99.6% 


Potentially | 
















O<x1<20M 





O< x2 < 100 


— 
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Measure 1. Facility Backlogged Maintenance 


Frequency Data 
Values Cumulative Cumulative 
(Millions) — Frequency Percent 


a Non- 4205 
Missing 
Missing 


Total 4325 








Descriptive Statistics 


1205 —_ i 11,979,371 448, 130.8 837,390.9 


Max Utility: 0 
Min Utility: 1,000,000 
Databases: RPINFODT 
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Measure 2. Facility Operating Costs 


























: Frequency Data a a | 
| Frequency | percent | Frequency | Percent 
Frequency Frequency Percent | 
ao fm | ss | me [oe 
(sor [0 | oo | wo | oo 
100-20 |e | ot | vs] 
SS OS 
ee 

74 » St ae 


Missing 


er ea a — 


1 
3) 
3 
& 


Total 1325 





Descriptive Statistics 


0 294 3.1 SI 





Max Utility: 0 
Min Utility: 100 
Databases: FPS 
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Measure 3. Facility Age 


















Frequency Data 
Values Cumulative | Cumulative 
(Months) Frequency <2 Frequency Percent 

seat 


ees 
Ce 





401 - 500 
501 - 750 
751 - 1000 
1001 - 1500 
1501 - 2000 
> 2000 


=a! 
Total Non-Missing 765 —— = 
Missing ; 0 


86 


A 








Descriptive Statistics 


Max Utility: 0 
Min Utility: 1,200 
Databases: INTEREST 
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Measure 4. Facility Condition 














: ~ Frequency Data , 
Frequency Frequency Percent 
| GREEN | 1251 | | 100.0 = | | 











Missing 


Green Utility: 1.0 
Amber Utility: 0.5 
Red Utility: 0.0 
Databases: FPS 
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Measure 5. Facility Ownership 


| Frequency Data . 
—_— Frequency Frequency Percent 
= 110 83. . oa | 11 = 


TouiNonwssna | | | id 


Missing 





Max Utility: YES 
Min Utility: NO 
Databases: COMPLEX 


17] 
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Measure 6. Competition 












Frequency Data. 7 3 | 

Values Cumulative | Cumulative | 

(People) Frequency Percent Frequency Percent 
a ee 


309 

519 

2,004 - 3,000 685 
3,001 - 4,000 129 814 
4,001 - 5,000 52 e666 | | 66 
: 


7,501 - 10,000 


Fouinonatesing | mo) SSC 
> | a 


Missing 


Total 4325 












Descriptive Statistics 


ee oo 
1300 20.759 4116.3 3960.0 


Max Utility: 0 
Min Utility: 10,000 
Databases: COMMAND PLAN, G17, G1I9TRUE, GEOREF, NGNON_ CL 





Ns: 
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Measure 7. Average Area Drill Attendance 


















es ee Se 


Cumulative | Cumulative | 
Frequency Percent | 


Frequency Data 
Values 
(Percent) Frequency 





0.91 - 0.99 


0.31 - 0.40 
0.41 - 0.50 
0.91 - 0.99 03 04.1 
0.61 - 0.70 472 36.3 
0.41 - 0.50 
0.91 - 0.99 


aan 


1267 


0.91 - 0.99 100.0 
= 1.0 100.0 
Total Non- Missing 1300 | o ale 


Missing 


I7 33 


Co 
NO 


~I 
Bay 
o|N on 
aS| ch ~ 
S| =) S| eS O 
oo) Sas | ° 












Descriptive Statistics 


Max Utility: 1 
Min Utility: 0 
Databases: COMMAND PLAN, FINANCE, G17, G19TRUE, GEOREF 
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Measure 8. Area Loss Rate 





0.81 - 0.90 


0.31 - 0.40 
0.41 - 0.50 
0.81 - 0.90 
0.61 - 0.70 
0.71 - 0.80 
0.81 - 0.90 
0.81 - 0.90 


0 


Missing 












Descriptive Statistics 


Max Utility: 0 
Min Utility: 1 
Databases: COMMAND PLAN, FYxxLOSS, G17, G18CWE, GEOREF 
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Measure 9. Area Transfer Rate 






























| Frequency Data _ 7 —— | 
oe nr —— I 
| (Percent) Frequency Frequency Percent | 
eo. s | | kT 
P—eor-ono [tos [ie -| 200 ~—=*d?SCtsw 
-eat-o20 | oe | mez | oe «| 
(est-oso | ae | as | ee | oso 
[—os-oso | om | ss «| es fear 
[estos | | ae (| ter ‘| een 









Ce 
a 
— Sa 


Total Non-Missing 1300 


25 
325 | 


56 
3.4 
eZ 
372 


Missing 









Descriptive Statistics 


Max Utility: 0 
Min Utility: >= 1 
Databases: COMMAND PLAN, FYxxLOSS, G17, G18CWE, GEOREF 
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Measure 10. Area Average Manning 









i ~ Frequency Data —- a oe 
| pereon | Freaueney | percent | Frequency | Pereant 

(Percent) Frequency Frequency Percent | 
a 
0.51 - 0.75 
0.76 - 0.80 
0.81 - 0.90 
1.01 - 1.20 









1.61 - 1.20 
6.81 - 6.90 









Descriptive Statistics 


Max Utility: 1.25 
Min Utility: 0 
Databases: COMMAND PLAN,G17, G1I8CWE, G1I9TRUE, GEOREF 
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Measure 11. Distance to Nearest Recruit Station 


SSS os 


— —", ~__ he@ianceahata—.——________ —_—=oml Frequency Data | = 1 

Values Pride, | ont, | Cumulative | Cumulative 

(Miles) Frequency Frequency Percent 
<i |  . ae 








> 50-75 


: 
© 
' 


> 100 - 200 











Descriptive Statistics 


Max Utility: 0 
Min Utility: >= 100 
Databases: RZA 
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Measure 12. Available Transfers from Closing Units 











3 Frequency Data 
: (People) Frequency Frequency 
: 0 bene 4 | ee. 






Cumulative | 
Percent 


1 - 50 


«ee 


251 - 300 


ce 


> 500 819 100.0 


Total 4 329 












Descriptive Statistics 


Max Utility: 250 
Min Utility: 0 
Databases: COMMAND PLAN, G17, G18CWE, GEOREF 
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Measure 13. IRR Available 











Frequency Data 
Values Cumulative Cumulative 
(People) Frequency Frequency Percent 


32.3 


ae 


ot ae 
Ee 


501 - 1,000 


© 
=) 









“Total 


4325 









Descriptive Statistics 


1315 —— 3.497 395.8 658.9 


Max Utility: >= 10,000 
Min Utility: 0 
Databases: IRR 
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Measure 14. Recruit Market 



















Cumulative | Cumulative | 
Frequency Percent 





Frequency Data 


Values 
(People) Frequency 


ae [rs [se 






0 - 5,000 
5,001 - 10,000 
10,001 - 25,000 


18.9 


249 

296 
> s0000 | 0 | 00 | sie | 1000 
TouNonatesne | ome | | 
[ oe | « | | es 












Descriptive Statistics 


1316 214.738 33.189.9 41.290.4 


Max Utility: >= 250,000 
Min Utility: 0 
Databases: QMA 
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Measure 16. Distance to Nearest Area Maintenance Support Activity 


Frequency Data 


> 10-20 


[Rastievaeeing [ves 


Total 1325 












Descriptive Statistics 


Max Utility: 0 
Min Utility: >= 500 
Databases: AMSA 


11 


THIS PAGE LEFT INTENTIONALLY BLANK 


2 


Measure 17. Distance to Nearest Equipment Concentration Site 











-———s«*érequency Data 7 
Values tan | Cumulative | Cumulative 
(Miles) Frequency Frequency Percent 

Ae ae 





o> 
oR) 


>0-10 
> 10-20 
> 20 - 30 
> 30 - 40 
> 40 - 50 
> 50-75 


84 
38 
84 
> 75 - 100 44.4 


> 100 - 200 
> 200 - 300 
> 300 100.0 


: Total 4325 


Oo 

oak 

(Se) 

© 

On 
“J | BR io isl, mos =] |] —|] — 
NI ALAIN] oO] a@la}nr 
HR |} PO] GO] oO] ©] aol n 


— 
wo 
aA 
om | oh 
Gs! 
NO | OD 
on} & 
© 
~J 
ee) 









Descriptive Statistics 


1325 p00 5290.9 268.1 510.1 


Max Utility: 0 
Min Utility: >= 200 
Databases: ECS 
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Measure 18. Facility Weekends Used 






Frequency Data 


Cumulative 
— F — 










Cumulative | 
Percent 


Total 1325 






Descriptive Statistics 


Max Utility: 0 
Min Utility: 4 
Databases: COMPLEX 
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Query Time 











= —a— 2 = 





Frequency Data — 


| Values Frequency Cumulative | Cumulative | 
| _ (Minutes) Frequency Percent 


15.1 





__ 


2 oy 
Ours 
a0 [8 
2 [| | «0 
er [2 












Descriptive Statistics 
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APPENDIX D. SOURCE FILE DOCUMENTATION FORMS 


This appendix contains the recommended forms to be used in gathering meta-data 


information for data files used in conjunction with developing a SDSS application. 
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Source File Documentation Form 


Customer Path & File Name: LAN Server: 


Point of Contact(Office & Phone): 


File Type: Size(MB): No. Records: 
Source File Name: Update Frequency: 


Source File Location: POC: Phone: ' 


Application File Name: 


File Description: 





Queries Used: 


Page 1 of 2 
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Source File Documentation Form 











Source File Name: Location: 








File Type: Size(MB): No. Records: 
Associated Tables: 


File Description: 













Required Data Elements 





Data Key 
Type Format | Field 


Description 





Extract Queries: 





Page 2 of 2 
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Wz 
IN5. 


14. 
| 
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